You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Kris Jirapinyo <kj...@biz360.com> on 2009/08/25 21:51:27 UTC

Intra-datanode balancing?

Hi all,
    I know this has been filed as a JIRA improvement already
http://issues.apache.org/jira/browse/HDFS-343, but is there any good
workaround at the moment?  What's happening is I have added a few new EBS
volumes to half of the cluster, but Hadoop doesn't want to write to them.
When I try to do cluster rebalancing, since the new disks make the
percentage used lower, it fills up the first two existing local disks, which
is exactly what I don't want to happen.  Currently, I just delete several
subdirs from dfs, since I know that with a replication factor of 3, it'll be
ok, so that fixes the problems in the short term.  But I still cannot get
Hadoop to use those new larger disks efficiently.  Any thoughts?

-- Kris.

Re: Intra-datanode balancing?

Posted by Alex Loddengaard <al...@cloudera.com>.

Changing the ordering of dfs.data.dir won't change anything, because
dfs.data.dir is written to in a round-robin fashion.

Kris, I think you're stuck with the hack you're performing :(.  Sorry I
don't have better news.

Alex

On Tue, Aug 25, 2009 at 1:16 PM, Ted Dunning <te...@gmail.com> wrote:

> Change the ordering of the volumes in the ocnfig files.
>
> On Tue, Aug 25, 2009 at 12:51 PM, Kris Jirapinyo <kjirapinyo@biz360.com
> >wrote:
>
> > Hi all,
> >    I know this has been filed as a JIRA improvement already
> > http://issues.apache.org/jira/browse/HDFS-343, but is there any good
> > workaround at the moment?  What's happening is I have added a few new EBS
> > volumes to half of the cluster, but Hadoop doesn't want to write to them.
> > When I try to do cluster rebalancing, since the new disks make the
> > percentage used lower, it fills up the first two existing local disks,
> > which
> > is exactly what I don't want to happen.  Currently, I just delete several
> > subdirs from dfs, since I know that with a replication factor of 3, it'll
> > be
> > ok, so that fixes the problems in the short term.  But I still cannot get
> > Hadoop to use those new larger disks efficiently.  Any thoughts?
> >
> > -- Kris.
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Intra-datanode balancing?

Posted by Ted Dunning <te...@gmail.com>.

It used to matter quite a lot.

On Tue, Aug 25, 2009 at 1:25 PM, Kris Jirapinyo
<kr...@biz360.com>wrote:

> The order matters?
>
>

Re: Intra-datanode balancing?

Posted by Kris Jirapinyo <kr...@biz360.com>.

The order matters?

On Tue, Aug 25, 2009 at 1:16 PM, Ted Dunning <te...@gmail.com> wrote:

> Change the ordering of the volumes in the ocnfig files.
>
> On Tue, Aug 25, 2009 at 12:51 PM, Kris Jirapinyo <kjirapinyo@biz360.com
> >wrote:
>
> > Hi all,
> >    I know this has been filed as a JIRA improvement already
> > http://issues.apache.org/jira/browse/HDFS-343, but is there any good
> > workaround at the moment?  What's happening is I have added a few new EBS
> > volumes to half of the cluster, but Hadoop doesn't want to write to them.
> > When I try to do cluster rebalancing, since the new disks make the
> > percentage used lower, it fills up the first two existing local disks,
> > which
> > is exactly what I don't want to happen.  Currently, I just delete several
> > subdirs from dfs, since I know that with a replication factor of 3, it'll
> > be
> > ok, so that fixes the problems in the short term.  But I still cannot get
> > Hadoop to use those new larger disks efficiently.  Any thoughts?
> >
> > -- Kris.
> >
>
>
>
> --
> Ted Dunning, CTO
> DeepDyve
>

Re: Intra-datanode balancing?

Posted by Ted Dunning <te...@gmail.com>.

Change the ordering of the volumes in the ocnfig files.

On Tue, Aug 25, 2009 at 12:51 PM, Kris Jirapinyo <kj...@biz360.com>wrote:

> Hi all,
>    I know this has been filed as a JIRA improvement already
> http://issues.apache.org/jira/browse/HDFS-343, but is there any good
> workaround at the moment?  What's happening is I have added a few new EBS
> volumes to half of the cluster, but Hadoop doesn't want to write to them.
> When I try to do cluster rebalancing, since the new disks make the
> percentage used lower, it fills up the first two existing local disks,
> which
> is exactly what I don't want to happen.  Currently, I just delete several
> subdirs from dfs, since I know that with a replication factor of 3, it'll
> be
> ok, so that fixes the problems in the short term.  But I still cannot get
> Hadoop to use those new larger disks efficiently.  Any thoughts?
>
> -- Kris.
>



-- 
Ted Dunning, CTO
DeepDyve

Re: Intra-datanode balancing?

Posted by Kris Jirapinyo <kr...@biz360.com>.

Hmm then in that case, it is possible for me to manually balance load those
datanodes by moving most of the files onto the new, larger partition.  I
will try it.  Thanks!

-- Kris J.

On Wed, Aug 26, 2009 at 10:13 AM, Raghu Angadi <ra...@yahoo-inc.com>wrote:

> Kris Jirapinyo wrote:
>
>> But I mean, then how does that datanode knows that these files were copied
>> from one partition to another, in this new directory?  I'm not sure the
>> inner workings of how a datanode knows what files are on itself...I was
>> assuming that it knows by keeping track of the subdir directory...
>>
>
>
>  or is that
>> just a placeholder name and whatever directory is under that parent
>> directory will be scanned and picked up by the datanode?
>>
>
> correct. directory name does not matter. Only requirement is a block file
> and its .meta file in the same directory. When datanode starts up it scans
> all these directories and stores their path in memory.
>
> Of course, this is still a big hack! (just making it clear for readers who
> haven't seen the full context).
>
> Raghu.
>
>
>  Kris.
>>
>> On Tue, Aug 25, 2009 at 6:24 PM, Raghu Angadi <ra...@yahoo-inc.com>
>> wrote:
>>
>>  Kris Jirapinyo wrote:
>>>
>>>  How does copying the subdir work?  What if that partition already has
>>>> the
>>>> same subdir (in the case that our partition is not new but relatively
>>>> new...with maybe 10% used)?
>>>>
>>>>  You can copy the files. There isn't really any requirement on number of
>>> files in  directory. something like cp -r subdir5 dest/subdir5 might do
>>> (or
>>> rsync without --delete option). Just make sure you delete the directory
>>> from
>>> the source.
>>>
>>> Raghu.
>>>
>>>
>>>  Thanks for the suggestions so far guys.
>>>
>>>> Kris.
>>>>
>>>> On Tue, Aug 25, 2009 at 5:01 PM, Raghu Angadi <ra...@yahoo-inc.com>
>>>> wrote:
>>>>
>>>>  For now you are stuck with the hack. Sooner or later hadoop has to
>>>> handle
>>>>
>>>>> heterogeneous nodes better.
>>>>>
>>>>> In general it tries to write to all the disks irrespective of % full
>>>>> since
>>>>> that gives the best performance (assuming each partition's capabilities
>>>>> are
>>>>> same). But it is lame at handling skews.
>>>>>
>>>>> Regd your hack :
>>>>>  1. You can copy subdir to new partition rather than deleting
>>>>>   (datanodes should be shutdown).
>>>>>
>>>>>  2. I would think it is less work to implement a better policy in
>>>>> DataNode
>>>>> for this case. It would be a pretty local change. When choosing a
>>>>> partition
>>>>> for a new block, DN already knows how much freespace is left on each
>>>>> one.
>>>>> for simplest implementation you skip partitions that have less 25% of
>>>>> avg
>>>>> freespace or choose with a probability proportional to relative
>>>>> freespace.
>>>>> If it works well, file a jira.
>>>>>
>>>>> I don't think HDFS-343 is directly related to this or is likely to be
>>>>> fixed. There is another jira that makes placement policy at NameNode
>>>>> pluggable (does not affect Datanode).
>>>>>
>>>>> Raghu.
>>>>>
>>>>>
>>>>> Kris Jirapinyo wrote:
>>>>>
>>>>>  Hi all,
>>>>>
>>>>>>  I know this has been filed as a JIRA improvement already
>>>>>> http://issues.apache.org/jira/browse/HDFS-343, but is there any good
>>>>>> workaround at the moment?  What's happening is I have added a few new
>>>>>> EBS
>>>>>> volumes to half of the cluster, but Hadoop doesn't want to write to
>>>>>> them.
>>>>>> When I try to do cluster rebalancing, since the new disks make the
>>>>>> percentage used lower, it fills up the first two existing local disks,
>>>>>> which
>>>>>> is exactly what I don't want to happen.  Currently, I just delete
>>>>>> several
>>>>>> subdirs from dfs, since I know that with a replication factor of 3,
>>>>>> it'll
>>>>>> be
>>>>>> ok, so that fixes the problems in the short term.  But I still cannot
>>>>>> get
>>>>>> Hadoop to use those new larger disks efficiently.  Any thoughts?
>>>>>>
>>>>>> -- Kris.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>
>

Re: Intra-datanode balancing?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Kris Jirapinyo wrote:
> But I mean, then how does that datanode knows that these files were copied
> from one partition to another, in this new directory?  I'm not sure the
> inner workings of how a datanode knows what files are on itself...I was
> assuming that it knows by keeping track of the subdir directory...


> or is that
> just a placeholder name and whatever directory is under that parent
> directory will be scanned and picked up by the datanode?

correct. directory name does not matter. Only requirement is a block 
file and its .meta file in the same directory. When datanode starts up 
it scans all these directories and stores their path in memory.

Of course, this is still a big hack! (just making it clear for readers 
who haven't seen the full context).

Raghu.

> Kris.
> 
> On Tue, Aug 25, 2009 at 6:24 PM, Raghu Angadi <ra...@yahoo-inc.com> wrote:
> 
>> Kris Jirapinyo wrote:
>>
>>> How does copying the subdir work?  What if that partition already has the
>>> same subdir (in the case that our partition is not new but relatively
>>> new...with maybe 10% used)?
>>>
>> You can copy the files. There isn't really any requirement on number of
>> files in  directory. something like cp -r subdir5 dest/subdir5 might do (or
>> rsync without --delete option). Just make sure you delete the directory from
>> the source.
>>
>> Raghu.
>>
>>
>>  Thanks for the suggestions so far guys.
>>> Kris.
>>>
>>> On Tue, Aug 25, 2009 at 5:01 PM, Raghu Angadi <ra...@yahoo-inc.com>
>>> wrote:
>>>
>>>  For now you are stuck with the hack. Sooner or later hadoop has to handle
>>>> heterogeneous nodes better.
>>>>
>>>> In general it tries to write to all the disks irrespective of % full
>>>> since
>>>> that gives the best performance (assuming each partition's capabilities
>>>> are
>>>> same). But it is lame at handling skews.
>>>>
>>>> Regd your hack :
>>>>  1. You can copy subdir to new partition rather than deleting
>>>>    (datanodes should be shutdown).
>>>>
>>>>  2. I would think it is less work to implement a better policy in
>>>> DataNode
>>>> for this case. It would be a pretty local change. When choosing a
>>>> partition
>>>> for a new block, DN already knows how much freespace is left on each one.
>>>> for simplest implementation you skip partitions that have less 25% of avg
>>>> freespace or choose with a probability proportional to relative
>>>> freespace.
>>>> If it works well, file a jira.
>>>>
>>>> I don't think HDFS-343 is directly related to this or is likely to be
>>>> fixed. There is another jira that makes placement policy at NameNode
>>>> pluggable (does not affect Datanode).
>>>>
>>>> Raghu.
>>>>
>>>>
>>>> Kris Jirapinyo wrote:
>>>>
>>>>  Hi all,
>>>>>   I know this has been filed as a JIRA improvement already
>>>>> http://issues.apache.org/jira/browse/HDFS-343, but is there any good
>>>>> workaround at the moment?  What's happening is I have added a few new
>>>>> EBS
>>>>> volumes to half of the cluster, but Hadoop doesn't want to write to
>>>>> them.
>>>>> When I try to do cluster rebalancing, since the new disks make the
>>>>> percentage used lower, it fills up the first two existing local disks,
>>>>> which
>>>>> is exactly what I don't want to happen.  Currently, I just delete
>>>>> several
>>>>> subdirs from dfs, since I know that with a replication factor of 3,
>>>>> it'll
>>>>> be
>>>>> ok, so that fixes the problems in the short term.  But I still cannot
>>>>> get
>>>>> Hadoop to use those new larger disks efficiently.  Any thoughts?
>>>>>
>>>>> -- Kris.
>>>>>
>>>>>
>>>>>
>

Re: Intra-datanode balancing?

Posted by Kris Jirapinyo <kr...@biz360.com>.

But I mean, then how does that datanode knows that these files were copied
from one partition to another, in this new directory?  I'm not sure the
inner workings of how a datanode knows what files are on itself...I was
assuming that it knows by keeping track of the subdir directory...or is that
just a placeholder name and whatever directory is under that parent
directory will be scanned and picked up by the datanode?

Kris.

On Tue, Aug 25, 2009 at 6:24 PM, Raghu Angadi <ra...@yahoo-inc.com> wrote:

> Kris Jirapinyo wrote:
>
>> How does copying the subdir work?  What if that partition already has the
>> same subdir (in the case that our partition is not new but relatively
>> new...with maybe 10% used)?
>>
>
> You can copy the files. There isn't really any requirement on number of
> files in  directory. something like cp -r subdir5 dest/subdir5 might do (or
> rsync without --delete option). Just make sure you delete the directory from
> the source.
>
> Raghu.
>
>
>  Thanks for the suggestions so far guys.
>>
>> Kris.
>>
>> On Tue, Aug 25, 2009 at 5:01 PM, Raghu Angadi <ra...@yahoo-inc.com>
>> wrote:
>>
>>  For now you are stuck with the hack. Sooner or later hadoop has to handle
>>> heterogeneous nodes better.
>>>
>>> In general it tries to write to all the disks irrespective of % full
>>> since
>>> that gives the best performance (assuming each partition's capabilities
>>> are
>>> same). But it is lame at handling skews.
>>>
>>> Regd your hack :
>>>  1. You can copy subdir to new partition rather than deleting
>>>    (datanodes should be shutdown).
>>>
>>>  2. I would think it is less work to implement a better policy in
>>> DataNode
>>> for this case. It would be a pretty local change. When choosing a
>>> partition
>>> for a new block, DN already knows how much freespace is left on each one.
>>> for simplest implementation you skip partitions that have less 25% of avg
>>> freespace or choose with a probability proportional to relative
>>> freespace.
>>> If it works well, file a jira.
>>>
>>> I don't think HDFS-343 is directly related to this or is likely to be
>>> fixed. There is another jira that makes placement policy at NameNode
>>> pluggable (does not affect Datanode).
>>>
>>> Raghu.
>>>
>>>
>>> Kris Jirapinyo wrote:
>>>
>>>  Hi all,
>>>>   I know this has been filed as a JIRA improvement already
>>>> http://issues.apache.org/jira/browse/HDFS-343, but is there any good
>>>> workaround at the moment?  What's happening is I have added a few new
>>>> EBS
>>>> volumes to half of the cluster, but Hadoop doesn't want to write to
>>>> them.
>>>> When I try to do cluster rebalancing, since the new disks make the
>>>> percentage used lower, it fills up the first two existing local disks,
>>>> which
>>>> is exactly what I don't want to happen.  Currently, I just delete
>>>> several
>>>> subdirs from dfs, since I know that with a replication factor of 3,
>>>> it'll
>>>> be
>>>> ok, so that fixes the problems in the short term.  But I still cannot
>>>> get
>>>> Hadoop to use those new larger disks efficiently.  Any thoughts?
>>>>
>>>> -- Kris.
>>>>
>>>>
>>>>
>>
>

Re: Intra-datanode balancing?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

Kris Jirapinyo wrote:
> How does copying the subdir work?  What if that partition already has the
> same subdir (in the case that our partition is not new but relatively
> new...with maybe 10% used)?

You can copy the files. There isn't really any requirement on number of 
files in  directory. something like cp -r subdir5 dest/subdir5 might do 
(or rsync without --delete option). Just make sure you delete the 
directory from the source.

Raghu.

> Thanks for the suggestions so far guys.
> 
> Kris.
> 
> On Tue, Aug 25, 2009 at 5:01 PM, Raghu Angadi <ra...@yahoo-inc.com> wrote:
> 
>> For now you are stuck with the hack. Sooner or later hadoop has to handle
>> heterogeneous nodes better.
>>
>> In general it tries to write to all the disks irrespective of % full since
>> that gives the best performance (assuming each partition's capabilities are
>> same). But it is lame at handling skews.
>>
>> Regd your hack :
>>  1. You can copy subdir to new partition rather than deleting
>>     (datanodes should be shutdown).
>>
>>  2. I would think it is less work to implement a better policy in DataNode
>> for this case. It would be a pretty local change. When choosing a partition
>> for a new block, DN already knows how much freespace is left on each one.
>> for simplest implementation you skip partitions that have less 25% of avg
>> freespace or choose with a probability proportional to relative freespace.
>> If it works well, file a jira.
>>
>> I don't think HDFS-343 is directly related to this or is likely to be
>> fixed. There is another jira that makes placement policy at NameNode
>> pluggable (does not affect Datanode).
>>
>> Raghu.
>>
>>
>> Kris Jirapinyo wrote:
>>
>>> Hi all,
>>>    I know this has been filed as a JIRA improvement already
>>> http://issues.apache.org/jira/browse/HDFS-343, but is there any good
>>> workaround at the moment?  What's happening is I have added a few new EBS
>>> volumes to half of the cluster, but Hadoop doesn't want to write to them.
>>> When I try to do cluster rebalancing, since the new disks make the
>>> percentage used lower, it fills up the first two existing local disks,
>>> which
>>> is exactly what I don't want to happen.  Currently, I just delete several
>>> subdirs from dfs, since I know that with a replication factor of 3, it'll
>>> be
>>> ok, so that fixes the problems in the short term.  But I still cannot get
>>> Hadoop to use those new larger disks efficiently.  Any thoughts?
>>>
>>> -- Kris.
>>>
>>>
>

Re: Intra-datanode balancing?

Posted by Kris Jirapinyo <kr...@biz360.com>.

How does copying the subdir work?  What if that partition already has the
same subdir (in the case that our partition is not new but relatively
new...with maybe 10% used)?

Thanks for the suggestions so far guys.

Kris.

On Tue, Aug 25, 2009 at 5:01 PM, Raghu Angadi <ra...@yahoo-inc.com> wrote:

>
> For now you are stuck with the hack. Sooner or later hadoop has to handle
> heterogeneous nodes better.
>
> In general it tries to write to all the disks irrespective of % full since
> that gives the best performance (assuming each partition's capabilities are
> same). But it is lame at handling skews.
>
> Regd your hack :
>  1. You can copy subdir to new partition rather than deleting
>     (datanodes should be shutdown).
>
>  2. I would think it is less work to implement a better policy in DataNode
> for this case. It would be a pretty local change. When choosing a partition
> for a new block, DN already knows how much freespace is left on each one.
> for simplest implementation you skip partitions that have less 25% of avg
> freespace or choose with a probability proportional to relative freespace.
> If it works well, file a jira.
>
> I don't think HDFS-343 is directly related to this or is likely to be
> fixed. There is another jira that makes placement policy at NameNode
> pluggable (does not affect Datanode).
>
> Raghu.
>
>
> Kris Jirapinyo wrote:
>
>> Hi all,
>>    I know this has been filed as a JIRA improvement already
>> http://issues.apache.org/jira/browse/HDFS-343, but is there any good
>> workaround at the moment?  What's happening is I have added a few new EBS
>> volumes to half of the cluster, but Hadoop doesn't want to write to them.
>> When I try to do cluster rebalancing, since the new disks make the
>> percentage used lower, it fills up the first two existing local disks,
>> which
>> is exactly what I don't want to happen.  Currently, I just delete several
>> subdirs from dfs, since I know that with a replication factor of 3, it'll
>> be
>> ok, so that fixes the problems in the short term.  But I still cannot get
>> Hadoop to use those new larger disks efficiently.  Any thoughts?
>>
>> -- Kris.
>>
>>
>

Re: Intra-datanode balancing?

Posted by Raghu Angadi <ra...@yahoo-inc.com>.

For now you are stuck with the hack. Sooner or later hadoop has to 
handle heterogeneous nodes better.

In general it tries to write to all the disks irrespective of % full 
since that gives the best performance (assuming each partition's 
capabilities are same). But it is lame at handling skews.

Regd your hack :
   1. You can copy subdir to new partition rather than deleting
      (datanodes should be shutdown).

   2. I would think it is less work to implement a better policy in 
DataNode for this case. It would be a pretty local change. When choosing 
a partition for a new block, DN already knows how much freespace is left 
on each one. for simplest implementation you skip partitions that have 
less 25% of avg freespace or choose with a probability proportional to 
relative freespace. If it works well, file a jira.

I don't think HDFS-343 is directly related to this or is likely to be 
fixed. There is another jira that makes placement policy at NameNode 
pluggable (does not affect Datanode).

Raghu.

Kris Jirapinyo wrote:
> Hi all,
>     I know this has been filed as a JIRA improvement already
> http://issues.apache.org/jira/browse/HDFS-343, but is there any good
> workaround at the moment?  What's happening is I have added a few new EBS
> volumes to half of the cluster, but Hadoop doesn't want to write to them.
> When I try to do cluster rebalancing, since the new disks make the
> percentage used lower, it fills up the first two existing local disks, which
> is exactly what I don't want to happen.  Currently, I just delete several
> subdirs from dfs, since I know that with a replication factor of 3, it'll be
> ok, so that fixes the problems in the short term.  But I still cannot get
> Hadoop to use those new larger disks efficiently.  Any thoughts?
> 
> -- Kris.
>