You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Andy Isaacson <ad...@cloudera.com> on 2012/10/17 01:45:15 UTC

Re: one or more file system

RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
other problems). Read this paper for details:

"Disks are like Snowflakes: No Two Are Alike"
www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf

For best performance configure your storage as JBOD instead of RAID,
format each spindle as a separate ext4 filesystem, and put a datadir
on each spindle.

Your disk array will have a configuration utility to set JBOD instead
of RAID. Please consult the documentation for your disk array for the
details.

If you must use RAID5 then one filesystem and one datadir is your best option.

For *BAD* performance, put multiple logical volumes on a single RAID
and put multiple datadirs on the RAID. This will result in low IOPS,
low throughput, and high contention.

-andy

On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <be...@gmail.com> wrote:
> Hi,
>    but how to "configure disk array as JBOD", we plan to use disk array
> with RAID5 and make LUN of 1T.
>   so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
> have  12 fs /data1...../data12, which will be put into HDFS.
>
>
> Best R.
>
> beatls
>
> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <ad...@cloudera.com> wrote:
>
>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <be...@gmail.com> wrote:
>> > Hi,
>> >    we have 4T disk from a diskarray.
>> >    i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>> > local storage directories.
>> >    this time we have 12 local directories(1T), is ti harmful to hdfs
>> > performance?
>>
>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>> later, or RHEL6):
>>
>> For best performance you should configure your disk array as JBOD
>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>> put multiple storage directories on a single spindle, that results in
>> very bad performance and no benefit over a single storage directory
>> per spindle. And do not put multiple spindles under a single storage
>> directory, that results in poor utilization and bad performance with
>> no significant benefit.
>>
>> 12 local storage directories will perform just fine assuming you have
>> enough CPU power to use them.
>>
>> -andy
>>

Re: one or more file system

Posted by Arun C Murthy <ac...@hortonworks.com>.
Can you guys pls move this discussion to user@? Thanks.

On Oct 16, 2012, at 4:45 PM, Andy Isaacson wrote:

> RAID5 is suboptimal for HDFS due to the spindle imbalance issue (among
> other problems). Read this paper for details:
> 
> "Disks are like Snowflakes: No Two Are Alike"
> www.usenix.org/event/hotos11/tech/final_files/Krevat.pdf
> 
> For best performance configure your storage as JBOD instead of RAID,
> format each spindle as a separate ext4 filesystem, and put a datadir
> on each spindle.
> 
> Your disk array will have a configuration utility to set JBOD instead
> of RAID. Please consult the documentation for your disk array for the
> details.
> 
> If you must use RAID5 then one filesystem and one datadir is your best option.
> 
> For *BAD* performance, put multiple logical volumes on a single RAID
> and put multiple datadirs on the RAID. This will result in low IOPS,
> low throughput, and high contention.
> 
> -andy
> 
> On Tue, Oct 9, 2012 at 2:13 AM, Xiang Hua <be...@gmail.com> wrote:
>> Hi,
>>   but how to "configure disk array as JBOD", we plan to use disk array
>> with RAID5 and make LUN of 1T.
>>  so we have many LUN of the size of 1T. and we mkfs on every LUN,so we
>> have  12 fs /data1...../data12, which will be put into HDFS.
>> 
>> 
>> Best R.
>> 
>> beatls
>> 
>> On Tue, Oct 9, 2012 at 1:45 AM, Andy Isaacson <ad...@cloudera.com> wrote:
>> 
>>> On Mon, Oct 8, 2012 at 8:30 AM, Xiang Hua <be...@gmail.com> wrote:
>>>> Hi,
>>>>   we have 4T disk from a diskarray.
>>>>   i want to split 2T*1 to 1T*2, then add to HDFS, which leads to more
>>>> local storage directories.
>>>>   this time we have 12 local directories(1T), is ti harmful to hdfs
>>>> performance?
>>> 
>>> Assuming you're running a modern Hadoop on a recent Linux (2.6.38 or
>>> later, or RHEL6):
>>> 
>>> For best performance you should configure your disk array as JBOD
>>> rather than RAID, then put one ext4 filesystem on each spindle. Do not
>>> put multiple storage directories on a single spindle, that results in
>>> very bad performance and no benefit over a single storage directory
>>> per spindle. And do not put multiple spindles under a single storage
>>> directory, that results in poor utilization and bad performance with
>>> no significant benefit.
>>> 
>>> 12 local storage directories will perform just fine assuming you have
>>> enough CPU power to use them.
>>> 
>>> -andy
>>> 

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/