You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Jeffrey Buell <jb...@vmware.com> on 2011/07/09 04:58:32 UTC

storage performance problem

I'm seeing an odd storage performance problem that I hope can be fixed with the right configuration parameter, but nothing I've tried so far helps.  These tests were done in a virtual machine running on ESX, but earlier tests on native RHEL showed something similar.

Common configuration:
7 nodes with 10 GbE interconnect.
Each node: 2 socket Westmere, 96 GB, 10 local SATA disks exported to the VM as JBODs, single 92 GB VM.
TestDFSIO: 140 files, 7143 MB each (about 1 TB total data), so 2 map tasks per disk.  Replication=2.

Case A:  RHEL 5.5, EXT3 file system, write through configured on the physical disk
Case B:  RHEL 6.1, EXT4 FS, write back

Testing with aio-stress shows that the changes made in Case B all improved efficiency and performance.  But running the write test of TestDFSIO on hadoop (using CDH3u0) got worse:

Case A:  580 seconds exec time
Case B:  740 seconds

I can improve Case B to 710 seconds by going back to EXT3, or by mounting EXT4 with min_batch_time=2000, so slowing down the FS improves hadoop performance.

Both cases show a peak write throughput of about 550 MB/s on each node.  The difference is that Case A the throughput is steady and doesn't drop below 500 MB/s, but in B it is very noisy, sometimes going all the way to 0.  It is also sometimes periodic, rising and falling with a 15-30 second period.  That period is synchronized across all the nodes.  550 MB/s appears to be a controller limit, each disk alone is capable of 130 MB/s (with a raw partition or EXT4, EXT3 is about 100 MB/s).  I tried replication=1 to eliminate nearly all networking, but storage throughput was still not steady.

I'm thinking that faster storage somehow confuses the scheduler, but I don't see what the mechanism is.  Any ideas what's going on or things to try?  I don't want to have to recommend de-tuning storage in order to get hadoop to behave.

Thanks for the help,

Jeff