You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Sameer Farooqui <ca...@gmail.com> on 2011/09/28 23:45:10 UTC

Hadoop performance benchmarking with TestDFSIO

Hi everyone,

I'm looking for some recommendations for how to get our Hadoop cluster to do
faster I/O.

Currently, our lab cluster is 8 worker nodes and 1 master node (with
NameNode and JobTracker).

Each worker node has:
- 48 GB RAM
- 16 processors (Intel Xeon E5630 @ 2.53 GHz)
- 1 Gb Ethernet connection


Due to company policy, we have to keep the HDFS storage on a disk array. Our
SAS connected array is capable of 6 Gb (768 MB) for each of the 8 hosts. So,
theoretically, we should be able to get a max of 6 GB simultaneous reads
across the 8 nodes if we benchmark it.

Our disk array is presenting each of the 8 nodes with a 21 TB LUN. The LUN
is RAID-5 across 12 disks on the array. That LUN is partitioned on the
server into 6 different devices like this:

>> df -h

Filesystem            Size  Used Avail Use% Mounted on
/dev/sdg1             3.5T  445G  2.9T  14% /data2/d1
/dev/sdg2             3.5T  439G  2.9T  14% /data2/d2
/dev/sdg3             3.5T  436G  2.9T  13% /data2/d3
/dev/sdg4             3.5T  435G  2.9T  13% /data2/d4
/dev/sdg5             3.5T  434G  2.9T  13% /data2/d5
/dev/sdg6             3.5T  431G  2.9T  13% /data2/d6

The file system type is ext3.

So, when we run TestDFSIO, here are the results:

*++ Write ++*
>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -write
-nrFiles 80 -fileSize 10000

11/09/27 18:54:53 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
11/09/27 18:54:53 INFO fs.TestDFSIO:            Date & time: Tue Sep 27
18:54:53 EDT 2011
11/09/27 18:54:53 INFO fs.TestDFSIO:        Number of files: 80
11/09/27 18:54:53 INFO fs.TestDFSIO: Total MBytes processed: 800000
11/09/27 18:54:53 INFO fs.TestDFSIO:      Throughput mb/sec: 8.2742240008678
11/09/27 18:54:53 INFO fs.TestDFSIO: Average IO rate mb/sec:
8.288116455078125
11/09/27 18:54:53 INFO fs.TestDFSIO:  IO rate std deviation:
0.3435565217052116
11/09/27 18:54:53 INFO fs.TestDFSIO:     Test exec time sec: 1427.856

So, throughput across all 8 nodes is 8.27 * 80 = 661 MB per second.


*++ Read ++*
>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -read
-nrFiles 80 -fileSize 10000

11/09/27 19:43:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
11/09/27 19:43:12 INFO fs.TestDFSIO:            Date & time: Tue Sep 27
19:43:12 EDT 2011
11/09/27 19:43:12 INFO fs.TestDFSIO:        Number of files: 80
11/09/27 19:43:12 INFO fs.TestDFSIO: Total MBytes processed: 800000
11/09/27 19:43:12 INFO fs.TestDFSIO:      Throughput mb/sec:
5.854318503905489
11/09/27 19:43:12 INFO fs.TestDFSIO: Average IO rate mb/sec:
5.96372652053833
11/09/27 19:43:12 INFO fs.TestDFSIO:  IO rate std deviation:
0.9885505979030621
11/09/27 19:43:12 INFO fs.TestDFSIO:     Test exec time sec: 2055.465


So, throughput across all 8 nodes is 5.85 * 80 = 468 MB per second.


*Question 1:* Why are the reads and writes so much slower than expected? Any
suggestions about what can be changed? I understand that RAID-5 backed disks
are an unorthodox configuration for HDFS, but has anybody successfully done
this? If so, what kind of results did you see?



Also, we detached the 8 nodes from the disk array and connected each of them
to 6 local hard drives for testing (w/ ext4 file system). Then we ran the
same read TestDFSIO and saw this:

11/09/26 20:24:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
11/09/26 20:24:09 INFO fs.TestDFSIO:            Date & time: Mon Sep 26
20:24:09 EDT 2011
11/09/26 20:24:09 INFO fs.TestDFSIO:        Number of files: 80
11/09/26 20:24:09 INFO fs.TestDFSIO: Total MBytes processed: 800000
11/09/26 20:24:09 INFO fs.TestDFSIO:      Throughput mb/sec:
13.065623285187982
11/09/26 20:24:09 INFO fs.TestDFSIO: Average IO rate mb/sec:
15.160531997680664
11/09/26 20:24:09 INFO fs.TestDFSIO:  IO rate std deviation:
8.000530562022949
11/09/26 20:24:09 INFO fs.TestDFSIO:     Test exec time sec: 1123.447


So, with local disks, reads are about 1 GB per second across the 8 nodes.
Much faster!


With 6 local disks, writes performed the same though:

11/09/26 19:49:58 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
11/09/26 19:49:58 INFO fs.TestDFSIO:            Date & time: Mon Sep 26
19:49:58 EDT 2011
11/09/26 19:49:58 INFO fs.TestDFSIO:        Number of files: 80
11/09/26 19:49:58 INFO fs.TestDFSIO: Total MBytes processed: 800000
11/09/26 19:49:58 INFO fs.TestDFSIO:      Throughput mb/sec:
8.573949802610528
11/09/26 19:49:58 INFO fs.TestDFSIO: Average IO rate mb/sec:
8.588902473449707
11/09/26 19:49:58 INFO fs.TestDFSIO:  IO rate std deviation:
0.3639466752546032
11/09/26 19:49:58 INFO fs.TestDFSIO:     Test exec time sec: 1383.734


Write throughput across the cluster was 685 MB per second.


By the way, our HDFS file system is healthy:

Status: HEALTHY
 Total size:    9018951544337 B
 Total dirs:    24230
 Total files:   1032578
 Total blocks (validated):      1139580 (avg. block size 7914276 B)
 Minimally replicated blocks:   1139580 (100.0 %)
 Over-replicated blocks:        1 (8.775163E-5 %)
 Under-replicated blocks:       16 (0.001404026 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    2
 Average block replication:     2.0122387
 Corrupt blocks:                0
 Missing replicas:              32 (0.0013954865 %)
 Number of data-nodes:          8
 Number of racks:               1
FSCK ended at Tue Sep 27 18:57:23 EDT 2011 in 7453 milliseconds


The filesystem under path '/' is HEALTHY


- Sameer

Re: Hadoop performance benchmarking with TestDFSIO

Posted by Steve Loughran <st...@apache.org>.

On 28/09/11 22:45, Sameer Farooqui wrote:
> Hi everyone,
>
> I'm looking for some recommendations for how to get our Hadoop cluster to do
> faster I/O.
>
> Currently, our lab cluster is 8 worker nodes and 1 master node (with
> NameNode and JobTracker).
>
> Each worker node has:
> - 48 GB RAM
> - 16 processors (Intel Xeon E5630 @ 2.53 GHz)
> - 1 Gb Ethernet connection
>
>
> Due to company policy, we have to keep the HDFS storage on a disk array. Our
> SAS connected array is capable of 6 Gb (768 MB) for each of the 8 hosts. So,
> theoretically, we should be able to get a max of 6 GB simultaneous reads
> across the 8 nodes if we benchmark it.

missing the point on Hadoop there; you will end up getting the bandwidth 
of the HDD most likely to fail next, copy replication is overkill and 
you will reach limits on scale both technical (SAN scalability) and 
financial.

>
> Our disk array is presenting each of the 8 nodes with a 21 TB LUN. The LUN
> is RAID-5 across 12 disks on the array. That LUN is partitioned on the
> server into 6 different devices like this:
>



> The file system type is ext3.

set noatime

>
> So, when we run TestDFSIO, here are the results:
>
> *++ Write ++*
>>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -write
> -nrFiles 80 -fileSize 10000
>
> 11/09/27 18:54:53 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
> 11/09/27 18:54:53 INFO fs.TestDFSIO:            Date&  time: Tue Sep 27
> 18:54:53 EDT 2011
> 11/09/27 18:54:53 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/27 18:54:53 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/27 18:54:53 INFO fs.TestDFSIO:      Throughput mb/sec: 8.2742240008678
> 11/09/27 18:54:53 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 8.288116455078125
> 11/09/27 18:54:53 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.3435565217052116
> 11/09/27 18:54:53 INFO fs.TestDFSIO:     Test exec time sec: 1427.856
>
> So, throughput across all 8 nodes is 8.27 * 80 = 661 MB per second.
>
>
> *++ Read ++*
>>> hadoop jar /usr/lib/hadoop/hadoop-test-0.20.2-CDH3B4.jar TestDFSIO -read
> -nrFiles 80 -fileSize 10000
>
> 11/09/27 19:43:12 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
> 11/09/27 19:43:12 INFO fs.TestDFSIO:            Date&  time: Tue Sep 27
> 19:43:12 EDT 2011
> 11/09/27 19:43:12 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/27 19:43:12 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/27 19:43:12 INFO fs.TestDFSIO:      Throughput mb/sec:
> 5.854318503905489
> 11/09/27 19:43:12 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 5.96372652053833
> 11/09/27 19:43:12 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.9885505979030621
> 11/09/27 19:43:12 INFO fs.TestDFSIO:     Test exec time sec: 2055.465
>
>
> So, throughput across all 8 nodes is 5.85 * 80 = 468 MB per second.
>
>
> *Question 1:* Why are the reads and writes so much slower than expected? Any
> suggestions about what can be changed? I understand that RAID-5 backed disks
> are an unorthodox configuration for HDFS, but has anybody successfully done
> this? If so, what kind of results did you see?


>
>
> Also, we detached the 8 nodes from the disk array and connected each of them
> to 6 local hard drives for testing (w/ ext4 file system). Then we ran the
> same read TestDFSIO and saw this:
>
> 11/09/26 20:24:09 INFO fs.TestDFSIO: ----- TestDFSIO ----- : read
> 11/09/26 20:24:09 INFO fs.TestDFSIO:            Date&  time: Mon Sep 26
> 20:24:09 EDT 2011
> 11/09/26 20:24:09 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/26 20:24:09 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/26 20:24:09 INFO fs.TestDFSIO:      Throughput mb/sec:
> 13.065623285187982
> 11/09/26 20:24:09 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 15.160531997680664
> 11/09/26 20:24:09 INFO fs.TestDFSIO:  IO rate std deviation:
> 8.000530562022949
> 11/09/26 20:24:09 INFO fs.TestDFSIO:     Test exec time sec: 1123.447
>
>
> So, with local disks, reads are about 1 GB per second across the 8 nodes.
> Much faster!

Much lower cost per TB too. Orders of magnitude lower.

>
> With 6 local disks, writes performed the same though:
>
> 11/09/26 19:49:58 INFO fs.TestDFSIO: ----- TestDFSIO ----- : write
> 11/09/26 19:49:58 INFO fs.TestDFSIO:            Date&  time: Mon Sep 26
> 19:49:58 EDT 2011
> 11/09/26 19:49:58 INFO fs.TestDFSIO:        Number of files: 80
> 11/09/26 19:49:58 INFO fs.TestDFSIO: Total MBytes processed: 800000
> 11/09/26 19:49:58 INFO fs.TestDFSIO:      Throughput mb/sec:
> 8.573949802610528
> 11/09/26 19:49:58 INFO fs.TestDFSIO: Average IO rate mb/sec:
> 8.588902473449707
> 11/09/26 19:49:58 INFO fs.TestDFSIO:  IO rate std deviation:
> 0.3639466752546032
> 11/09/26 19:49:58 INFO fs.TestDFSIO:     Test exec time sec: 1383.734
>
>
> Write throughput across the cluster was 685 MB per second.

Writes get streamed to multiple HDFS nodes for redundancy; you've got 
the bandwidth + network overhead and 3x the data.


Options
  -stop using HDFS on the SAN, it's the wrong approach. Mount the SAN 
directly and use file:// URLs, let the SAN do the networking and 
redundancy.
  -buy some local HDDs at least for all the temp data: logs, overspill 
mapreduce.tmp.dir. You don't need redundancy here