You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by Ferdy Galema <fe...@kalooga.com> on 2011/05/05 22:45:56 UTC

our experiences with various filesystems and tuning options

Hi,

We've performed tests for ext3 and xfs filesystems using different 
settings. The results might be useful for anyone else.

The datanode cluster consists of 15 slave nodes, each equipped with 
1Gbit ethernet, X3220@2.40GHz quadcores and 4x1TB disks. The disk read 
speeds vary from about 90 to 130MB/s. (Tested using hdparm -t).

Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node)
OS: Linux version 2.6.18-238.5.1.el5 (mockbuild@builder10.centos.org) 
(gcc version 4.1.2 20080704 (Red Hat 4.1.2-50))

#our command
for i in `seq 1 10`; do ./hadoop jar 
../hadoop-examples-0.20.2-cdh3u0.jar randomwriter -Ddfs.replication=1 
/rand$i && ./hadoop fs -rmr /rand$i/_logs /rand$i/_SUCCESS && ./hadoop 
distcp -Ddfs.replication=1 /rand$i /rand-copy$i; done

Our benchmark consists of a standard random-writer job followed by a 
distcp of the same data, both using a replication of 1. This is to make 
sure only the disks get hit. Each benchmark is ran several times for 
every configuration. Because of the occasional hickup, I will list both 
the average and the fastest times for each configuration. I read the 
execution times off the jobtracker.

The configurations (with exection times in seconds of Avg-writer / 
Min-writer / Avg-distcp / Min-distcp)
ext3-default      158 / 136 / 411 / 343
ext3-tuned        159 / 132 / 330 / 297
ra1024 ext3-tuned 159 / 132 / 292 / 264
ra1024 xfs-tuned  128 / 122 / 220 / 202

To explain, ext3-tuned is with tuned mount options 
[noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead 
buffer of 1024 blocks. The xfs disks are created using mkfs options 
[size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8].

In conclusion it seems that using tuned xfs filesystems combined with 
increased read-ahead buffers increased our basic hdfs performance with 
about 10% (random-writer) to 40% (distcp).

Hopefully this is useful to anyone. Although I won't be performing more 
tests soon I'd be happy to provide more details.
Ferdy.

Re: our experiences with various filesystems and tuning options

Posted by Marcos Ortiz <ml...@uci.cu>.

On 05/10/2011 06:29 AM, Rita wrote:
> I keep asking because I wasn't able to use a XFS filesystem larger 
> than 3-4TB. If the XFS file system is larger than 4TB hdfs won't 
> recognize the space. I am on a 64bit RHEL 5.3 host.
>
>
> On Tue, May 10, 2011 at 6:30 AM, Will Maier <wcmaier@hep.wisc.edu 
> <ma...@hep.wisc.edu>> wrote:
>
>     On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote:
>     > what filesystem are they using and what is the size of each
>     filesystem?
>
>     It sounds nuts, but each disk has its own ext3 filesystem. Beyond
>     switching to
>     the deadline IO scheduler, we haven't done much tuning/tweaking. A
>     script runs
>     every ten minutes to test all of the data mounts and reconfigure
>     hdfs-site.xml
>     and restart the datanode if necessary. So far, this approach has
>     allowed us to
>     avoid loss of space to RAID without correlating the risk of disk
>     failure by
>     building larger RAID0s.
>
>     In the future, we expect to deprecate the script and rely on the
>     datanode process
>     itself to handle missing/failing disks.
>
>     --
>
>     Will Maier - UW High Energy Physics
>     cel: 608.438.6162 <tel:608.438.6162>
>     tel: 608.263.9692 <tel:608.263.9692>
>     web: http://www.hep.wisc.edu/~wcmaier/
>     <http://www.hep.wisc.edu/%7Ewcmaier/>
>
>
>
>
> -- 
> --- Get your facts first, then you can distort them as you please.--
I saw this problem before with 64 bits version of Red Hat EL 5.3.
Which is the kernel version that you are using?

Can you upgrade the system to 5.5 or to 6.0? There are a lot of bugs 
corrections and performance gaining with these releases.
Another issue is that since the 5.4 vesion, Red Hat added preliminary 
XFS support specifically to address the need for filesystem more
large, and their RHEL 6 release treats it as a fully supported 
filesystem on par with ext3 and ext4.

One last issue: XFS can handle files greather than 16 TB. The primary 
problem is the tools to read and write those files. (ext4 virtually too can
handle this huge files, but the problems is on the mkfs utility that is 
not optimized for this)

Regards

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer (Large-Scaled Distributed Systems)
  University of Information Sciences,
  La Habana, Cuba
  Linux User # 418229
  http://about.me/marcosortiz

Re: our experiences with various filesystems and tuning options

Posted by Allen Wittenauer <aw...@apache.org>.

On May 10, 2011, at 6:14 AM, Marcos Ortiz wrote:
> My prefered filesystem is ZFS, It's a shame that Linux support is very inmature yet. For that reason, I changed my PostgreSQL hosts to FreeBSD-8.0 to use
> ZFS like filesystem and it's really rocks.
> 
> Had anyone tested a Hadoop cluster with this filesystem?
> On Solaris or FreeBSD?

	HDFS capacity numbers go really wonky on pooled storage systems like ZFS.  Other than that, performance is more than acceptable vs. ext4.  [Sorry, I don't have my benchmark numbers handy.]

Re: our experiences with various filesystems and tuning options

Posted by Marcos Ortiz <ml...@uci.cu>.

On 05/10/2011 06:56 AM, Jonathan Disher wrote:
> In a previous life, I've had extreme problems with XFS, including 
> kernel panics and data loss under high load.
>
> Those were database servers, not Hadoop nodes, and it was a few years 
> ago.  But, ext3/ext4 seems to be stable enough, and it's more widely 
> supported, so it's my preference.
>
> -j
>
> On May 10, 2011, at 3:59 AM, Rita wrote:
>
>> I keep asking because I wasn't able to use a XFS filesystem larger 
>> than 3-4TB. If the XFS file system is larger than 4TB hdfs won't 
>> recognize the space. I am on a 64bit RHEL 5.3 host.
>>
>>
>> On Tue, May 10, 2011 at 6:30 AM, Will Maier <wcmaier@hep.wisc.edu 
>> <ma...@hep.wisc.edu>> wrote:
>>
>>     On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote:
>>     > what filesystem are they using and what is the size of each
>>     filesystem?
>>
>>     It sounds nuts, but each disk has its own ext3 filesystem. Beyond
>>     switching to
>>     the deadline IO scheduler, we haven't done much tuning/tweaking.
>>     A script runs
>>     every ten minutes to test all of the data mounts and reconfigure
>>     hdfs-site.xml
>>     and restart the datanode if necessary. So far, this approach has
>>     allowed us to
>>     avoid loss of space to RAID without correlating the risk of disk
>>     failure by
>>     building larger RAID0s.
>>
>>     In the future, we expect to deprecate the script and rely on the
>>     datanode process
>>     itself to handle missing/failing disks.
>>
>>     --
>>
>>     Will Maier - UW High Energy Physics
>>     cel: 608.438.6162 <tel:608.438.6162>
>>     tel: 608.263.9692 <tel:608.263.9692>
>>     web: http://www.hep.wisc.edu/~wcmaier/
>>     <http://www.hep.wisc.edu/%7Ewcmaier/>
>>
>>
>>
>>
>> -- 
>> --- Get your facts first, then you can distort them as you please.--
>
Jonathan, I had the same issues on my PostgreSQL servers, and the main 
issues was given by the kernel version that I was using.
I upgrade the kernel to the last version supported by Red Hat, and 
everything worked OK.

My prefered filesystem is ZFS, It's a shame that Linux support is very 
inmature yet. For that reason, I changed my PostgreSQL hosts to 
FreeBSD-8.0 to use
ZFS like filesystem and it's really rocks.

Had anyone tested a Hadoop cluster with this filesystem?
On Solaris or FreeBSD?

Regards

-- 
Marcos Luís Ortíz Valmaseda
  Software Engineer (Large-Scaled Distributed Systems)
  University of Information Sciences,
  La Habana, Cuba
  Linux User # 418229
  http://about.me/marcosortiz

Re: our experiences with various filesystems and tuning options

Posted by Jonathan Disher <jd...@parad.net>.

In a previous life, I've had extreme problems with XFS, including kernel panics and data loss under high load.

Those were database servers, not Hadoop nodes, and it was a few years ago.  But, ext3/ext4 seems to be stable enough, and it's more widely supported, so it's my preference.

-j

On May 10, 2011, at 3:59 AM, Rita wrote:

> I keep asking because I wasn't able to use a XFS filesystem larger than 3-4TB. If the XFS file system is larger than 4TB hdfs won't recognize the space. I am on a 64bit RHEL 5.3 host.
> 
> 
> On Tue, May 10, 2011 at 6:30 AM, Will Maier <wc...@hep.wisc.edu> wrote:
> On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote:
> > what filesystem are they using and what is the size of each filesystem?
> 
> It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching to
> the deadline IO scheduler, we haven't done much tuning/tweaking. A script runs
> every ten minutes to test all of the data mounts and reconfigure hdfs-site.xml
> and restart the datanode if necessary. So far, this approach has allowed us to
> avoid loss of space to RAID without correlating the risk of disk failure by
> building larger RAID0s.
> 
> In the future, we expect to deprecate the script and rely on the datanode process
> itself to handle missing/failing disks.
> 
> --
> 
> Will Maier - UW High Energy Physics
> cel: 608.438.6162
> tel: 608.263.9692
> web: http://www.hep.wisc.edu/~wcmaier/
> 
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--

Re: our experiences with various filesystems and tuning options

Posted by Rita <rm...@gmail.com>.

I keep asking because I wasn't able to use a XFS filesystem larger than
3-4TB. If the XFS file system is larger than 4TB hdfs won't recognize the
space. I am on a 64bit RHEL 5.3 host.


On Tue, May 10, 2011 at 6:30 AM, Will Maier <wc...@hep.wisc.edu> wrote:

> On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote:
> > what filesystem are they using and what is the size of each filesystem?
>
> It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching
> to
> the deadline IO scheduler, we haven't done much tuning/tweaking. A script
> runs
> every ten minutes to test all of the data mounts and reconfigure
> hdfs-site.xml
> and restart the datanode if necessary. So far, this approach has allowed us
> to
> avoid loss of space to RAID without correlating the risk of disk failure by
> building larger RAID0s.
>
> In the future, we expect to deprecate the script and rely on the datanode
> process
> itself to handle missing/failing disks.
>
> --
>
> Will Maier - UW High Energy Physics
> cel: 608.438.6162
> tel: 608.263.9692
> web: http://www.hep.wisc.edu/~wcmaier/
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: our experiences with various filesystems and tuning options

Posted by Will Maier <wc...@hep.wisc.edu>.

On Tue, May 10, 2011 at 12:03:09AM -0400, Rita wrote:
> what filesystem are they using and what is the size of each filesystem?

It sounds nuts, but each disk has its own ext3 filesystem. Beyond switching to
the deadline IO scheduler, we haven't done much tuning/tweaking. A script runs
every ten minutes to test all of the data mounts and reconfigure hdfs-site.xml
and restart the datanode if necessary. So far, this approach has allowed us to
avoid loss of space to RAID without correlating the risk of disk failure by
building larger RAID0s.

In the future, we expect to deprecate the script and rely on the datanode process
itself to handle missing/failing disks.

-- 

Will Maier - UW High Energy Physics
cel: 608.438.6162
tel: 608.263.9692
web: http://www.hep.wisc.edu/~wcmaier/

Re: our experiences with various filesystems and tuning options

Posted by Jonathan Disher <jd...@parad.net>.

This cluster is specifically a near-line archive cluster, so storage density is more important than computational performance.  Our primary production cluster (which actually does very little in the way of computation) is comprised of Dell R510's with 10 disks in JBOD and a two disk mirrored OS drive.  48 of those makes a nice speedy cluster.

-j

On May 10, 2011, at 1:57 PM, Allen Wittenauer wrote:

> 
> On May 9, 2011, at 11:46 PM, Jonathan Disher wrote:
> 
>> I cant speak for Will, but I'm actually going against recommendations, my systems have three 20TB RAID 6 arrays, with two 10TB ext4 filesystems per array.
>> 
>> The problems you will encounter keeping machines performing well after they get internally unbalanced following disk failures and replacements (and keeping machines online with non-standard configs, missing disks, etc) will drive you nuts.  It drives me nuts.
> 
> 	This sounds more like you just don't have enough nodes if you are that concerned about single machine performance. :)
>

Re: our experiences with various filesystems and tuning options

Posted by Allen Wittenauer <aw...@apache.org>.

On May 9, 2011, at 11:46 PM, Jonathan Disher wrote:

> I cant speak for Will, but I'm actually going against recommendations, my systems have three 20TB RAID 6 arrays, with two 10TB ext4 filesystems per array.
> 
> The problems you will encounter keeping machines performing well after they get internally unbalanced following disk failures and replacements (and keeping machines online with non-standard configs, missing disks, etc) will drive you nuts.  It drives me nuts.

	This sounds more like you just don't have enough nodes if you are that concerned about single machine performance. :)

Re: our experiences with various filesystems and tuning options

Posted by Jonathan Disher <jd...@parad.net>.

I cant speak for Will, but I'm actually going against recommendations, my systems have three 20TB RAID 6 arrays, with two 10TB ext4 filesystems per array.

The problems you will encounter keeping machines performing well after they get internally unbalanced following disk failures and replacements (and keeping machines online with non-standard configs, missing disks, etc) will drive you nuts.  It drives me nuts.

-j

On May 9, 2011, at 9:03 PM, Rita wrote:

> what filesystem are they using and what is the size of each filesystem?
> 
> 
> On Mon, May 9, 2011 at 9:22 PM, Will Maier <wc...@hep.wisc.edu> wrote:
> On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote:
> > Speak for yourself, I just built a bunch of 36 disk datanodes :)
> 
> And I just unboxed 10 more 36 disk systems to join the two already in our
> cluster. We also have 20 systems with 24 disks, though most of our datanodes are
> have more typical four disks...
> 
> --
> 
> Will Maier - UW High Energy Physics
> cel: 608.438.6162
> tel: 608.263.9692
> web: http://www.hep.wisc.edu/~wcmaier/
> 
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--

Re: our experiences with various filesystems and tuning options

Posted by Rita <rm...@gmail.com>.

what filesystem are they using and what is the size of each filesystem?


On Mon, May 9, 2011 at 9:22 PM, Will Maier <wc...@hep.wisc.edu> wrote:

> On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote:
> > Speak for yourself, I just built a bunch of 36 disk datanodes :)
>
> And I just unboxed 10 more 36 disk systems to join the two already in our
> cluster. We also have 20 systems with 24 disks, though most of our
> datanodes are
> have more typical four disks...
>
> --
>
> Will Maier - UW High Energy Physics
> cel: 608.438.6162
> tel: 608.263.9692
> web: http://www.hep.wisc.edu/~wcmaier/
>



-- 
--- Get your facts first, then you can distort them as you please.--

Re: our experiences with various filesystems and tuning options

Posted by Will Maier <wc...@hep.wisc.edu>.

On Mon, May 09, 2011 at 05:07:29PM -0700, Jonathan Disher wrote:
> Speak for yourself, I just built a bunch of 36 disk datanodes :)

And I just unboxed 10 more 36 disk systems to join the two already in our
cluster. We also have 20 systems with 24 disks, though most of our datanodes are
have more typical four disks...

-- 

Will Maier - UW High Energy Physics
cel: 608.438.6162
tel: 608.263.9692
web: http://www.hep.wisc.edu/~wcmaier/

Re: our experiences with various filesystems and tuning options

Posted by Jonathan Disher <jd...@parad.net>.

Speak for yourself, I just built a bunch of 36 disk datanodes :)

-j

On May 9, 2011, at 2:33 AM, Eric wrote:

> Just a small warning: I've seen kernel panics with the XFS kernel module once you have many disks (in my case: > 20 disks). This is an exotic amount of disks to put in one server so it shouldn't hold anyone back from using XFS :-)
> 
> 2011/5/7 Rita <rm...@gmail.com>
> Sheng,
> 
> How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick it up.
> 
> 
> 2011/5/6 Ferdy Galema <fe...@kalooga.com>
> No unfortunately not, we couldn't because of our kernel versions.
> 
> 
> On 05/06/2011 04:00 AM, ShengChang Gu wrote:
>> 
>> Many thanks.
>> 
>> We use xfs all the time.Have you try the ext4 filesystem?
>> 
>> 2011/5/6 Ferdy Galema <fe...@kalooga.com>
>> Hi,
>> 
>> We've performed tests for ext3 and xfs filesystems using different settings. The results might be useful for anyone else.
>> 
>> The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit ethernet, X3220@2.40GHz quadcores and 4x1TB disks. The disk read speeds vary from about 90 to 130MB/s. (Tested using hdparm -t).
>> 
>> Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node)
>> OS: Linux version 2.6.18-238.5.1.el5 (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50))
>> 
>> #our command
>> for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i /rand-copy$i; done
>> 
>> Our benchmark consists of a standard random-writer job followed by a distcp of the same data, both using a replication of 1. This is to make sure only the disks get hit. Each benchmark is ran several times for every configuration. Because of the occasional hickup, I will list both the average and the fastest times for each configuration. I read the execution times off the jobtracker.
>> 
>> The configurations (with exection times in seconds of Avg-writer / Min-writer / Avg-distcp / Min-distcp)
>> ext3-default      158 / 136 / 411 / 343
>> ext3-tuned        159 / 132 / 330 / 297
>> ra1024 ext3-tuned 159 / 132 / 292 / 264
>> ra1024 xfs-tuned  128 / 122 / 220 / 202
>> 
>> To explain, ext3-tuned is with tuned mount options [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer of 1024 blocks. The xfs disks are created using mkfs options [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8].
>> 
>> In conclusion it seems that using tuned xfs filesystems combined with increased read-ahead buffers increased our basic hdfs performance with about 10% (random-writer) to 40% (distcp).
>> 
>> Hopefully this is useful to anyone. Although I won't be performing more tests soon I'd be happy to provide more details.
>> Ferdy.
>> 
>> 
>> 
>> -- 
>> 阿昌
> 
> 
> 
> -- 
> --- Get your facts first, then you can distort them as you please.--
>

Re: our experiences with various filesystems and tuning options

Posted by Eric <er...@gmail.com>.

Just a small warning: I've seen kernel panics with the XFS kernel module
once you have many disks (in my case: > 20 disks). This is an exotic amount
of disks to put in one server so it shouldn't hold anyone back from using
XFS :-)

2011/5/7 Rita <rm...@gmail.com>

> Sheng,
>
> How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick
> it up.
>
>
> 2011/5/6 Ferdy Galema <fe...@kalooga.com>
>
>>  No unfortunately not, we couldn't because of our kernel versions.
>>
>>
>> On 05/06/2011 04:00 AM, ShengChang Gu wrote:
>>
>> Many thanks.
>>
>> We use xfs all the time.Have you try the ext4 filesystem?
>>
>> 2011/5/6 Ferdy Galema <fe...@kalooga.com>
>>
>>> Hi,
>>>
>>> We've performed tests for ext3 and xfs filesystems using different
>>> settings. The results might be useful for anyone else.
>>>
>>> The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit
>>> ethernet, X3220@2.40GHz quadcores and 4x1TB disks. The disk read speeds
>>> vary from about 90 to 130MB/s. (Tested using hdparm -t).
>>>
>>> Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node)
>>> OS: Linux version 2.6.18-238.5.1.el5 (mockbuild@builder10.centos.org)
>>> (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50))
>>>
>>> #our command
>>> for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar
>>> randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs
>>> /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i
>>> /rand-copy$i; done
>>>
>>> Our benchmark consists of a standard random-writer job followed by a
>>> distcp of the same data, both using a replication of 1. This is to make sure
>>> only the disks get hit. Each benchmark is ran several times for every
>>> configuration. Because of the occasional hickup, I will list both the
>>> average and the fastest times for each configuration. I read the execution
>>> times off the jobtracker.
>>>
>>> The configurations (with exection times in seconds of Avg-writer /
>>> Min-writer / Avg-distcp / Min-distcp)
>>> ext3-default      158 / 136 / 411 / 343
>>> ext3-tuned        159 / 132 / 330 / 297
>>> ra1024 ext3-tuned 159 / 132 / 292 / 264
>>> ra1024 xfs-tuned  128 / 122 / 220 / 202
>>>
>>> To explain, ext3-tuned is with tuned mount options
>>> [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer
>>> of 1024 blocks. The xfs disks are created using mkfs options
>>> [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8].
>>>
>>> In conclusion it seems that using tuned xfs filesystems combined with
>>> increased read-ahead buffers increased our basic hdfs performance with about
>>> 10% (random-writer) to 40% (distcp).
>>>
>>> Hopefully this is useful to anyone. Although I won't be performing more
>>> tests soon I'd be happy to provide more details.
>>>  Ferdy.
>>>
>>
>>
>>
>> --
>> 阿昌
>>
>>
>
>
> --
> --- Get your facts first, then you can distort them as you please.--
>

Re: our experiences with various filesystems and tuning options

Posted by Rita <rm...@gmail.com>.

Sheng,

How big is your each XFS volume? We noticed if its over 4TB hdfs won't pick
it up.


2011/5/6 Ferdy Galema <fe...@kalooga.com>

>  No unfortunately not, we couldn't because of our kernel versions.
>
>
> On 05/06/2011 04:00 AM, ShengChang Gu wrote:
>
> Many thanks.
>
> We use xfs all the time.Have you try the ext4 filesystem?
>
> 2011/5/6 Ferdy Galema <fe...@kalooga.com>
>
>> Hi,
>>
>> We've performed tests for ext3 and xfs filesystems using different
>> settings. The results might be useful for anyone else.
>>
>> The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit
>> ethernet, X3220@2.40GHz quadcores and 4x1TB disks. The disk read speeds
>> vary from about 90 to 130MB/s. (Tested using hdparm -t).
>>
>> Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node)
>> OS: Linux version 2.6.18-238.5.1.el5 (mockbuild@builder10.centos.org)
>> (gcc version 4.1.2 20080704 (Red Hat 4.1.2-50))
>>
>> #our command
>> for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar
>> randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs
>> /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i
>> /rand-copy$i; done
>>
>> Our benchmark consists of a standard random-writer job followed by a
>> distcp of the same data, both using a replication of 1. This is to make sure
>> only the disks get hit. Each benchmark is ran several times for every
>> configuration. Because of the occasional hickup, I will list both the
>> average and the fastest times for each configuration. I read the execution
>> times off the jobtracker.
>>
>> The configurations (with exection times in seconds of Avg-writer /
>> Min-writer / Avg-distcp / Min-distcp)
>> ext3-default      158 / 136 / 411 / 343
>> ext3-tuned        159 / 132 / 330 / 297
>> ra1024 ext3-tuned 159 / 132 / 292 / 264
>> ra1024 xfs-tuned  128 / 122 / 220 / 202
>>
>> To explain, ext3-tuned is with tuned mount options
>> [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer
>> of 1024 blocks. The xfs disks are created using mkfs options
>> [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8].
>>
>> In conclusion it seems that using tuned xfs filesystems combined with
>> increased read-ahead buffers increased our basic hdfs performance with about
>> 10% (random-writer) to 40% (distcp).
>>
>> Hopefully this is useful to anyone. Although I won't be performing more
>> tests soon I'd be happy to provide more details.
>>  Ferdy.
>>
>
>
>
> --
> 阿昌
>
>


-- 
--- Get your facts first, then you can distort them as you please.--

Re: our experiences with various filesystems and tuning options

Posted by Ferdy Galema <fe...@kalooga.com>.

No unfortunately not, we couldn't because of our kernel versions.

On 05/06/2011 04:00 AM, ShengChang Gu wrote:
> Many thanks.
>
> We use xfs all the time.Have you try the ext4 filesystem?
>
> 2011/5/6 Ferdy Galema <ferdy.galema@kalooga.com
> <ma...@kalooga.com>>
>
>     Hi,
>
>     We've performed tests for ext3 and xfs filesystems using different
>     settings. The results might be useful for anyone else.
>
>     The datanode cluster consists of 15 slave nodes, each equipped
>     with 1Gbit ethernet, X3220@2.40GHz quadcores and 4x1TB disks. The
>     disk read speeds vary from about 90 to 130MB/s. (Tested using
>     hdparm -t).
>
>     Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node)
>     OS: Linux version 2.6.18-238.5.1.el5
>     (mockbuild@builder10.centos.org
>     <ma...@builder10.centos.org>) (gcc version 4.1.2
>     20080704 (Red Hat 4.1.2-50))
>
>     #our command
>     for i in `seq 1 10`; do ./hadoop jar
>     ../hadoop-examples-0.20.2-cdh3u0.jar randomwriter
>     -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs
>     /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i
>     /rand-copy$i; done
>
>     Our benchmark consists of a standard random-writer job followed by
>     a distcp of the same data, both using a replication of 1. This is
>     to make sure only the disks get hit. Each benchmark is ran several
>     times for every configuration. Because of the occasional hickup, I
>     will list both the average and the fastest times for each
>     configuration. I read the execution times off the jobtracker.
>
>     The configurations (with exection times in seconds of Avg-writer /
>     Min-writer / Avg-distcp / Min-distcp)
>     ext3-default 158 / 136 / 411 / 343
>     ext3-tuned 159 / 132 / 330 / 297
>     ra1024 ext3-tuned 159 / 132 / 292 / 264
>     ra1024 xfs-tuned 128 / 122 / 220 / 202
>
>     To explain, ext3-tuned is with tuned mount options
>     [noatime,nodiratime,data=writeback,rw] and ra1024 means a
>     read-ahead buffer of 1024 blocks. The xfs disks are created using
>     mkfs options [size=128m,lazy-count=1] and mount options
>     [noatime,nodiratime,logbufs=8].
>
>     In conclusion it seems that using tuned xfs filesystems combined
>     with increased read-ahead buffers increased our basic hdfs
>     performance with about 10% (random-writer) to 40% (distcp).
>
>     Hopefully this is useful to anyone. Although I won't be performing
>     more tests soon I'd be happy to provide more details.
>     Ferdy.
>
>
>
>
> -- 
> 阿昌

Re: our experiences with various filesystems and tuning options

Posted by ShengChang Gu <gu...@gmail.com>.

Many thanks.

We use xfs all the time.Have you try the ext4 filesystem?

2011/5/6 Ferdy Galema <fe...@kalooga.com>

> Hi,
>
> We've performed tests for ext3 and xfs filesystems using different
> settings. The results might be useful for anyone else.
>
> The datanode cluster consists of 15 slave nodes, each equipped with 1Gbit
> ethernet, X3220@2.40GHz quadcores and 4x1TB disks. The disk read speeds
> vary from about 90 to 130MB/s. (Tested using hdparm -t).
>
> Hadoop: Cloudera CDH3u0 (4 concurrent mappers / node)
> OS: Linux version 2.6.18-238.5.1.el5 (mockbuild@builder10.centos.org) (gcc
> version 4.1.2 20080704 (Red Hat 4.1.2-50))
>
> #our command
> for i in `seq 1 10`; do ./hadoop jar ../hadoop-examples-0.20.2-cdh3u0.jar
> randomwriter -Ddfs.replication=1 /rand$i && ./hadoop fs -rmr /rand$i/_logs
> /rand$i/_SUCCESS && ./hadoop distcp -Ddfs.replication=1 /rand$i
> /rand-copy$i; done
>
> Our benchmark consists of a standard random-writer job followed by a distcp
> of the same data, both using a replication of 1. This is to make sure only
> the disks get hit. Each benchmark is ran several times for every
> configuration. Because of the occasional hickup, I will list both the
> average and the fastest times for each configuration. I read the execution
> times off the jobtracker.
>
> The configurations (with exection times in seconds of Avg-writer /
> Min-writer / Avg-distcp / Min-distcp)
> ext3-default      158 / 136 / 411 / 343
> ext3-tuned        159 / 132 / 330 / 297
> ra1024 ext3-tuned 159 / 132 / 292 / 264
> ra1024 xfs-tuned  128 / 122 / 220 / 202
>
> To explain, ext3-tuned is with tuned mount options
> [noatime,nodiratime,data=writeback,rw] and ra1024 means a read-ahead buffer
> of 1024 blocks. The xfs disks are created using mkfs options
> [size=128m,lazy-count=1] and mount options [noatime,nodiratime,logbufs=8].
>
> In conclusion it seems that using tuned xfs filesystems combined with
> increased read-ahead buffers increased our basic hdfs performance with about
> 10% (random-writer) to 40% (distcp).
>
> Hopefully this is useful to anyone. Although I won't be performing more
> tests soon I'd be happy to provide more details.
> Ferdy.
>



-- 
阿昌