You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Ajit Ratnaparkhi <aj...@gmail.com> on 2011/12/05 10:29:46 UTC

Re: What does hdfs balancer do after adding more disks to existing datanode.

Hi,

dfs data directory at a datanode stores blocks in following directory
structure:
All blocks are stored at location:
<dfs.data.dir>/current/

This directory contains some blocks and some subdirectories named like
'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63)

To be precise, each directory in directory hierarchy rooted
at <dfs.data.dir>/current/ contains max 64 block (data+metadata) plus max
64 subdirectories (named subdir0 to subdir63).

So my question is, whenever I do a manual block transfer across disks for
load balancing with newly added disks, do I need to take care of
maintaining this constraint of directory hierarchy? or just putting blocks
in <data.dfs.dir>/current/ will work?

thanks,
Ajit.

On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi <
ajit.ratnaparkhi@gmail.com> wrote:

> Thanks Harsh!
>
>
> On Tue, Nov 22, 2011 at 10:05 PM, Harsh J <ha...@cloudera.com> wrote:
>
>> Ajit / Inder,
>>
>> Please see
>> http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
>>
>> On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
>> <aj...@gmail.com> wrote:
>> > Thanks for Help Joey!
>> > Does just copying block files from one drive to another work?
>> > Isn't there metadata maintained at datanode about block locations on
>> that
>> > datanode? If not, then how does datanode know about blocks stored on it?
>> >
>> > -Ajit.
>> > On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria <jo...@cloudera.com>
>> wrote:
>> >>
>> >> The balancer only balances between datanodes. This means the new
>> >> drives won't get used until you start writing new data to them. If you
>> >> want to balance the drives on a node, you need to
>> >>
>> >> 1) copy a bunch of block files from the old drives to the new drives
>> >> 2) shutdown the datanode
>> >> 3) delete the old block files
>> >> 4) configure the datanode to see the new drives
>> >> 5) start the datanode
>> >>
>> >> -Joey
>> >>
>> >> On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
>> >> <aj...@gmail.com> wrote:
>> >> > Hi,
>> >> > If I add additional disks to existing datanode (assume existing
>> datanode
>> >> > has
>> >> > 7 1TB disk which are already 80% full and then I add two new 2TB
>> disks
>> >> > 0%
>> >> > full) and then run balancer, does balancer balance data in a
>> datanode?
>> >> > ie.
>> >> > Will it move data from existing disks to newly added disks such that
>> all
>> >> > disks are approx equally full ?
>> >> > thanks,
>> >> > Ajit.
>> >>
>> >>
>> >>
>> >> --
>> >> Joseph Echeverria
>> >> Cloudera, Inc.
>> >> 443.305.9434
>> >
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>

Re: What does hdfs balancer do after adding more disks to existing datanode.

Posted by Harsh J <ha...@cloudera.com>.
Ajit,

Just move/merge subdirectories - its the easiest way to go about it and does no harm. For confidence, you can also fire up a test cluster and test out these things :)

On 05-Dec-2011, at 2:59 PM, Ajit Ratnaparkhi wrote:

> Hi,
> 
> dfs data directory at a datanode stores blocks in following directory structure:
> All blocks are stored at location:
> <dfs.data.dir>/current/
> 
> This directory contains some blocks and some subdirectories named like 'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63)
> 
> To be precise, each directory in directory hierarchy rooted at <dfs.data.dir>/current/ contains max 64 block (data+metadata) plus max 64 subdirectories (named subdir0 to subdir63).
> 
> So my question is, whenever I do a manual block transfer across disks for load balancing with newly added disks, do I need to take care of maintaining this constraint of directory hierarchy? or just putting blocks in <data.dfs.dir>/current/ will work?
> 
> thanks,
> Ajit.
> 
> On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi <aj...@gmail.com> wrote:
> Thanks Harsh!
> 
> 
> On Tue, Nov 22, 2011 at 10:05 PM, Harsh J <ha...@cloudera.com> wrote:
> Ajit / Inder,
> 
> Please see http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
> 
> On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
> <aj...@gmail.com> wrote:
> > Thanks for Help Joey!
> > Does just copying block files from one drive to another work?
> > Isn't there metadata maintained at datanode about block locations on that
> > datanode? If not, then how does datanode know about blocks stored on it?
> >
> > -Ajit.
> > On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria <jo...@cloudera.com> wrote:
> >>
> >> The balancer only balances between datanodes. This means the new
> >> drives won't get used until you start writing new data to them. If you
> >> want to balance the drives on a node, you need to
> >>
> >> 1) copy a bunch of block files from the old drives to the new drives
> >> 2) shutdown the datanode
> >> 3) delete the old block files
> >> 4) configure the datanode to see the new drives
> >> 5) start the datanode
> >>
> >> -Joey
> >>
> >> On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
> >> <aj...@gmail.com> wrote:
> >> > Hi,
> >> > If I add additional disks to existing datanode (assume existing datanode
> >> > has
> >> > 7 1TB disk which are already 80% full and then I add two new 2TB disks
> >> > 0%
> >> > full) and then run balancer, does balancer balance data in a datanode?
> >> > ie.
> >> > Will it move data from existing disks to newly added disks such that all
> >> > disks are approx equally full ?
> >> > thanks,
> >> > Ajit.
> >>
> >>
> >>
> >> --
> >> Joseph Echeverria
> >> Cloudera, Inc.
> >> 443.305.9434
> >
> >
> 
> 
> 
> --
> Harsh J
> 
> 


Re: What does hdfs balancer do after adding more disks to existing datanode.

Posted by Harsh J <ha...@cloudera.com>.
Ajit,

Just move/merge subdirectories - its the easiest way to go about it and does no harm. For confidence, you can also fire up a test cluster and test out these things :)

On 05-Dec-2011, at 2:59 PM, Ajit Ratnaparkhi wrote:

> Hi,
> 
> dfs data directory at a datanode stores blocks in following directory structure:
> All blocks are stored at location:
> <dfs.data.dir>/current/
> 
> This directory contains some blocks and some subdirectories named like 'subdir*' (eg. subdir0, subdir1, ... ,subdir33, ..,subdir63)
> 
> To be precise, each directory in directory hierarchy rooted at <dfs.data.dir>/current/ contains max 64 block (data+metadata) plus max 64 subdirectories (named subdir0 to subdir63).
> 
> So my question is, whenever I do a manual block transfer across disks for load balancing with newly added disks, do I need to take care of maintaining this constraint of directory hierarchy? or just putting blocks in <data.dfs.dir>/current/ will work?
> 
> thanks,
> Ajit.
> 
> On Tue, Nov 22, 2011 at 11:04 PM, Ajit Ratnaparkhi <aj...@gmail.com> wrote:
> Thanks Harsh!
> 
> 
> On Tue, Nov 22, 2011 at 10:05 PM, Harsh J <ha...@cloudera.com> wrote:
> Ajit / Inder,
> 
> Please see http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F
> 
> On Tue, Nov 22, 2011 at 9:44 PM, Ajit Ratnaparkhi
> <aj...@gmail.com> wrote:
> > Thanks for Help Joey!
> > Does just copying block files from one drive to another work?
> > Isn't there metadata maintained at datanode about block locations on that
> > datanode? If not, then how does datanode know about blocks stored on it?
> >
> > -Ajit.
> > On Tue, Nov 22, 2011 at 5:25 PM, Joey Echeverria <jo...@cloudera.com> wrote:
> >>
> >> The balancer only balances between datanodes. This means the new
> >> drives won't get used until you start writing new data to them. If you
> >> want to balance the drives on a node, you need to
> >>
> >> 1) copy a bunch of block files from the old drives to the new drives
> >> 2) shutdown the datanode
> >> 3) delete the old block files
> >> 4) configure the datanode to see the new drives
> >> 5) start the datanode
> >>
> >> -Joey
> >>
> >> On Tue, Nov 22, 2011 at 6:43 AM, Ajit Ratnaparkhi
> >> <aj...@gmail.com> wrote:
> >> > Hi,
> >> > If I add additional disks to existing datanode (assume existing datanode
> >> > has
> >> > 7 1TB disk which are already 80% full and then I add two new 2TB disks
> >> > 0%
> >> > full) and then run balancer, does balancer balance data in a datanode?
> >> > ie.
> >> > Will it move data from existing disks to newly added disks such that all
> >> > disks are approx equally full ?
> >> > thanks,
> >> > Ajit.
> >>
> >>
> >>
> >> --
> >> Joseph Echeverria
> >> Cloudera, Inc.
> >> 443.305.9434
> >
> >
> 
> 
> 
> --
> Harsh J
> 
>