You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Yu Li <ca...@gmail.com> on 2010/06/29 06:32:25 UTC

Question about disk space allocation in hadoop

Hi all,

As we all know, machines in hadoop cluster may be both datanode and
tasktracker, so one machine may store both MR job intermediate data
and HDFS data. My question is: if we have more than one disk per node,
say 4 disks, and would like both job intermediate data and HDFS data
store into all disks to reduce IO times of each single disk, can we
draw a line between space of local FS and HDFS? For example, restrict
the intermediate temp data occupy no more than 25% space on each disk?
Thanks in advance.

Best Regards,
Carp

Re: Question about disk space allocation in hadoop

Posted by Steve Loughran <st...@apache.org>.
Yu Li wrote:
> Hi all,
> 
> Anybody has experience on this? Any Comments/Suggestions would be
> highly appreciated, Thanks.
> 
> Best Regards,
> Carp
> 
> 2010/6/29 Yu Li <ca...@gmail.com>:
>> Hi all,
>>
>> As we all know, machines in hadoop cluster may be both datanode and
>> tasktracker, so one machine may store both MR job intermediate data
>> and HDFS data. My question is: if we have more than one disk per node,
>> say 4 disks, and would like both job intermediate data and HDFS data
>> store into all disks to reduce IO times of each single disk, can we
>> draw a line between space of local FS and HDFS? For example, restrict
>> the intermediate temp data occupy no more than 25% space on each disk?
>> Thanks in advance.

There is some configuration parameter to limit space use of either HDFS 
or temp storage, but I forget its name -you'll have to look through the 
docs.

-steve

Re: Question about disk space allocation in hadoop

Posted by Yu Li <ca...@gmail.com>.
Hi all,

Anybody has experience on this? Any Comments/Suggestions would be
highly appreciated, Thanks.

Best Regards,
Carp

2010/6/29 Yu Li <ca...@gmail.com>:
> Hi all,
>
> As we all know, machines in hadoop cluster may be both datanode and
> tasktracker, so one machine may store both MR job intermediate data
> and HDFS data. My question is: if we have more than one disk per node,
> say 4 disks, and would like both job intermediate data and HDFS data
> store into all disks to reduce IO times of each single disk, can we
> draw a line between space of local FS and HDFS? For example, restrict
> the intermediate temp data occupy no more than 25% space on each disk?
> Thanks in advance.
>
> Best Regards,
> Carp
>

Re: Question about disk space allocation in hadoop

Posted by Yu Li <ca...@gmail.com>.
Hi Steve and Vitaliy,

Thanks a lot for your answers, and thanks for Vitaliy's suggestion, I'll
send questions to relevant mailing list:)

Best Regards,
Carp
2010/6/30 Vitaliy Semochkin <vi...@gmail.com>

> set dfs.datanode.du.reserved to amount of bytes you want to reserver for
> not
> HDFS usage.
>
> PS
> for search convenience IMHO better post such questions to
> hdfs-user@hadoop.apache.org ;-)
>
>
> Regards,
> Vitaliy S
>
> On Tue, Jun 29, 2010 at 8:32 AM, Yu Li <ca...@gmail.com> wrote:
>
> > Hi all,
> >
> > As we all know, machines in hadoop cluster may be both datanode and
> > tasktracker, so one machine may store both MR job intermediate data
> > and HDFS data. My question is: if we have more than one disk per node,
> > say 4 disks, and would like both job intermediate data and HDFS data
> > store into all disks to reduce IO times of each single disk, can we
> > draw a line between space of local FS and HDFS? For example, restrict
> > the intermediate temp data occupy no more than 25% space on each disk?
> > Thanks in advance.
> >
> > Best Regards,
> > Carp
> >
>

Re: Question about disk space allocation in hadoop

Posted by Vitaliy Semochkin <vi...@gmail.com>.
set dfs.datanode.du.reserved to amount of bytes you want to reserver for not
HDFS usage.

PS
for search convenience IMHO better post such questions to
hdfs-user@hadoop.apache.org ;-)


Regards,
Vitaliy S

On Tue, Jun 29, 2010 at 8:32 AM, Yu Li <ca...@gmail.com> wrote:

> Hi all,
>
> As we all know, machines in hadoop cluster may be both datanode and
> tasktracker, so one machine may store both MR job intermediate data
> and HDFS data. My question is: if we have more than one disk per node,
> say 4 disks, and would like both job intermediate data and HDFS data
> store into all disks to reduce IO times of each single disk, can we
> draw a line between space of local FS and HDFS? For example, restrict
> the intermediate temp data occupy no more than 25% space on each disk?
> Thanks in advance.
>
> Best Regards,
> Carp
>