You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hadoop.apache.org by Manoj Venkatesh <ma...@gmail.com> on 2015/02/06 20:34:43 UTC

Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
and 2 additional nodes were added later to increase disk and CPU capacity.
What i see is that processing is shared amongst all the nodes whereas the
storage is reaching capacity on the original 6 nodes whereas the newly
added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data
so that all the nodes are equally utilized. I have checked for the
configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
have options 'Round Robin' or 'Available Space', are there any other
configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Manoj,

You need to use balancer to re-balance data between nodes.
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer

 > *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
 > Robin' or 'Available Space', are there any other configurations which
 > need to be reviewed.
The option is for the disks in a node.

Regards,
Akira

On 2/6/15 11:34, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU
> capacity. What i see is that processing is shared amongst all the nodes
> whereas the storage is reaching capacity on the original 6 nodes whereas
> the newly added machines have relatively large amount of storage still
> unoccupied.
>
> I was wondering if there is an automated or any way of redistributing
> data so that all the nodes are equally utilized. I have checked for the
> configuration parameter -
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
> Robin' or 'Available Space', are there any other configurations which
> need to be reviewed.
>
> Thanks,
> Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Artem Ervits <ar...@gmail.com>.

Look at hdfs balancer

Artem Ervits
On Feb 6, 2015 5:54 PM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Manoj,

You need to use balancer to re-balance data between nodes.
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer

 > *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
 > Robin' or 'Available Space', are there any other configurations which
 > need to be reviewed.
The option is for the disks in a node.

Regards,
Akira

On 2/6/15 11:34, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU
> capacity. What i see is that processing is shared amongst all the nodes
> whereas the storage is reaching capacity on the original 6 nodes whereas
> the newly added machines have relatively large amount of storage still
> unoccupied.
>
> I was wondering if there is an automated or any way of redistributing
> data so that all the nodes are equally utilized. I have checked for the
> configuration parameter -
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
> Robin' or 'Available Space', are there any other configurations which
> need to be reviewed.
>
> Thanks,
> Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Ahmed Ossama <ah...@aossama.com>.

Hi,

Have you tried;

$ hdfs balancer

On 02/06/2015 09:34 PM, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster 
> creation and 2 additional nodes were added later to increase disk and 
> CPU capacity. What i see is that processing is shared amongst all the 
> nodes whereas the storage is reaching capacity on the original 6 nodes 
> whereas the newly added machines have relatively large amount of 
> storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing 
> data so that all the nodes are equally utilized. I have checked for 
> the configuration parameter - 
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round 
> Robin' or 'Available Space', are there any other configurations which 
> need to be reviewed.
>
> Thanks,
> Manoj

-- 
Regards,
Ahmed Ossama

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by David DONG <do...@gmail.com>.

Have you tried hdfs balancer?

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

On Fri, Feb 6, 2015 at 11:34 AM, Manoj Venkatesh <ma...@gmail.com>
wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Artem Ervits <ar...@gmail.com>.

Look at hdfs balancer

Artem Ervits
On Feb 6, 2015 5:54 PM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Manoj,

You need to use balancer to re-balance data between nodes.
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer

 > *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
 > Robin' or 'Available Space', are there any other configurations which
 > need to be reviewed.
The option is for the disks in a node.

Regards,
Akira

On 2/6/15 11:34, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU
> capacity. What i see is that processing is shared amongst all the nodes
> whereas the storage is reaching capacity on the original 6 nodes whereas
> the newly added machines have relatively large amount of storage still
> unoccupied.
>
> I was wondering if there is an automated or any way of redistributing
> data so that all the nodes are equally utilized. I have checked for the
> configuration parameter -
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
> Robin' or 'Available Space', are there any other configurations which
> need to be reviewed.
>
> Thanks,
> Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.

First confirm if new nodes are added into cluster or not. You can use
"hadoop dfsadmin -report" command to check per node hdfs usage.
If new nodes are listed in this command then you can run hadoop balancer to
manually redistribute some of the data.

Regards,
Chandrashekhar
On 07-Feb-2015 4:24 AM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Manoj Venkatesh <ma...@xoom.com>.

Thank you all for answering, the hdfs balancer worked. Now the datanodes capacity is more or less equally balanced.

Regards,
Manoj

From: Arpit Agarwal <aa...@hortonworks.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 3:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Manoj Venkatesh <ma...@xoom.com>.

Thank you all for answering, the hdfs balancer worked. Now the datanodes capacity is more or less equally balanced.

Regards,
Manoj

From: Arpit Agarwal <aa...@hortonworks.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 3:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Manoj Venkatesh <ma...@xoom.com>.

Thank you all for answering, the hdfs balancer worked. Now the datanodes capacity is more or less equally balanced.

Regards,
Manoj

From: Arpit Agarwal <aa...@hortonworks.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 3:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Manoj Venkatesh <ma...@xoom.com>.

Thank you all for answering, the hdfs balancer worked. Now the datanodes capacity is more or less equally balanced.

Regards,
Manoj

From: Arpit Agarwal <aa...@hortonworks.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 3:07 PM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

The information transmitted in this email is intended only for the person or entity to which it is addressed, and may contain material confidential to Xoom Corporation, and/or its subsidiary, buyindiaonline.com Inc. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient(s) is prohibited. If you received this email in error, please contact the sender and delete the material from your files.

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Arpit Agarwal <aa...@hortonworks.com>.

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Todd Snyder <ts...@blackberry.com>.

Look at the hadoop balancer - it will live data around and balance it across the nodes.

Sent from the wilds on my BlackBerry smartphone.
From: Manoj Venkatesh
Sent: Friday, February 6, 2015 5:54 PM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Todd Snyder <ts...@blackberry.com>.

Look at the hadoop balancer - it will live data around and balance it across the nodes.

Sent from the wilds on my BlackBerry smartphone.
From: Manoj Venkatesh
Sent: Friday, February 6, 2015 5:54 PM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.

First confirm if new nodes are added into cluster or not. You can use
"hadoop dfsadmin -report" command to check per node hdfs usage.
If new nodes are listed in this command then you can run hadoop balancer to
manually redistribute some of the data.

Regards,
Chandrashekhar
On 07-Feb-2015 4:24 AM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by David DONG <do...@gmail.com>.

Have you tried hdfs balancer?

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

On Fri, Feb 6, 2015 at 11:34 AM, Manoj Venkatesh <ma...@gmail.com>
wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Artem Ervits <ar...@gmail.com>.

Look at hdfs balancer

Artem Ervits
On Feb 6, 2015 5:54 PM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Todd Snyder <ts...@blackberry.com>.

Look at the hadoop balancer - it will live data around and balance it across the nodes.

Sent from the wilds on my BlackBerry smartphone.
From: Manoj Venkatesh
Sent: Friday, February 6, 2015 5:54 PM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Ahmed Ossama <ah...@aossama.com>.

Hi,

Have you tried;

$ hdfs balancer

On 02/06/2015 09:34 PM, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster 
> creation and 2 additional nodes were added later to increase disk and 
> CPU capacity. What i see is that processing is shared amongst all the 
> nodes whereas the storage is reaching capacity on the original 6 nodes 
> whereas the newly added machines have relatively large amount of 
> storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing 
> data so that all the nodes are equally utilized. I have checked for 
> the configuration parameter - 
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round 
> Robin' or 'Available Space', are there any other configurations which 
> need to be reviewed.
>
> Thanks,
> Manoj

-- 
Regards,
Ahmed Ossama

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Ahmed Ossama <ah...@aossama.com>.

Hi,

Have you tried;

$ hdfs balancer

On 02/06/2015 09:34 PM, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster 
> creation and 2 additional nodes were added later to increase disk and 
> CPU capacity. What i see is that processing is shared amongst all the 
> nodes whereas the storage is reaching capacity on the original 6 nodes 
> whereas the newly added machines have relatively large amount of 
> storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing 
> data so that all the nodes are equally utilized. I have checked for 
> the configuration parameter - 
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round 
> Robin' or 'Available Space', are there any other configurations which 
> need to be reviewed.
>
> Thanks,
> Manoj

-- 
Regards,
Ahmed Ossama

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Vikas Parashar <pa...@gmail.com>.

Hi Manoj,

Pls try

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer


Rg:
Vikas Parashar (Vicky)

On Sat, Feb 7, 2015 at 1:04 AM, Manoj Venkatesh <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Vikas Parashar <pa...@gmail.com>.

Hi Manoj,

Pls try

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer


Rg:
Vikas Parashar (Vicky)

On Sat, Feb 7, 2015 at 1:04 AM, Manoj Venkatesh <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by David DONG <do...@gmail.com>.

Have you tried hdfs balancer?

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

On Fri, Feb 6, 2015 at 11:34 AM, Manoj Venkatesh <ma...@gmail.com>
wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Akira AJISAKA <aj...@oss.nttdata.co.jp>.

Hi Manoj,

You need to use balancer to re-balance data between nodes.
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer

 > *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
 > Robin' or 'Available Space', are there any other configurations which
 > need to be reviewed.
The option is for the disks in a node.

Regards,
Akira

On 2/6/15 11:34, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU
> capacity. What i see is that processing is shared amongst all the nodes
> whereas the storage is reaching capacity on the original 6 nodes whereas
> the newly added machines have relatively large amount of storage still
> unoccupied.
>
> I was wondering if there is an automated or any way of redistributing
> data so that all the nodes are equally utilized. I have checked for the
> configuration parameter -
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round
> Robin' or 'Available Space', are there any other configurations which
> need to be reviewed.
>
> Thanks,
> Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by David DONG <do...@gmail.com>.

Have you tried hdfs balancer?

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#balancer

On Fri, Feb 6, 2015 at 11:34 AM, Manoj Venkatesh <ma...@gmail.com>
wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Vikas Parashar <pa...@gmail.com>.

Hi Manoj,

Pls try

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer


Rg:
Vikas Parashar (Vicky)

On Sat, Feb 7, 2015 at 1:04 AM, Manoj Venkatesh <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Artem Ervits <ar...@gmail.com>.

Look at hdfs balancer

Artem Ervits
On Feb 6, 2015 5:54 PM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.

First confirm if new nodes are added into cluster or not. You can use
"hadoop dfsadmin -report" command to check per node hdfs usage.
If new nodes are listed in this command then you can run hadoop balancer to
manually redistribute some of the data.

Regards,
Chandrashekhar
On 07-Feb-2015 4:24 AM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Chandrashekhar Kotekar <sh...@gmail.com>.

First confirm if new nodes are added into cluster or not. You can use
"hadoop dfsadmin -report" command to check per node hdfs usage.
If new nodes are listed in this command then you can run hadoop balancer to
manually redistribute some of the data.

Regards,
Chandrashekhar
On 07-Feb-2015 4:24 AM, "Manoj Venkatesh" <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Arpit Agarwal <aa...@hortonworks.com>.

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Vikas Parashar <pa...@gmail.com>.

Hi Manoj,

Pls try

http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer


Rg:
Vikas Parashar (Vicky)

On Sat, Feb 7, 2015 at 1:04 AM, Manoj Venkatesh <ma...@gmail.com> wrote:

> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation
> and 2 additional nodes were added later to increase disk and CPU capacity.
> What i see is that processing is shared amongst all the nodes whereas the
> storage is reaching capacity on the original 6 nodes whereas the newly
> added machines have relatively large amount of storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing data
> so that all the nodes are equally utilized. I have checked for the
> configuration parameter - *dfs.datanode.fsdataset.volume.choosing.policy*
> have options 'Round Robin' or 'Available Space', are there any other
> configurations which need to be reviewed.
>
> Thanks,
> Manoj
>

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Ahmed Ossama <ah...@aossama.com>.

Hi,

Have you tried;

$ hdfs balancer

On 02/06/2015 09:34 PM, Manoj Venkatesh wrote:
> Dear Hadoop experts,
>
> I have a Hadoop cluster of 8 nodes, 6 were added during cluster 
> creation and 2 additional nodes were added later to increase disk and 
> CPU capacity. What i see is that processing is shared amongst all the 
> nodes whereas the storage is reaching capacity on the original 6 nodes 
> whereas the newly added machines have relatively large amount of 
> storage still unoccupied.
>
> I was wondering if there is an automated or any way of redistributing 
> data so that all the nodes are equally utilized. I have checked for 
> the configuration parameter - 
> *dfs.datanode.fsdataset.volume.choosing.policy* have options 'Round 
> Robin' or 'Available Space', are there any other configurations which 
> need to be reviewed.
>
> Thanks,
> Manoj

-- 
Regards,
Ahmed Ossama

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Arpit Agarwal <aa...@hortonworks.com>.

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Todd Snyder <ts...@blackberry.com>.

Look at the hadoop balancer - it will live data around and balance it across the nodes.

Sent from the wilds on my BlackBerry smartphone.
From: Manoj Venkatesh
Sent: Friday, February 6, 2015 5:54 PM
To: user@hadoop.apache.org
Reply To: user@hadoop.apache.org
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj

Re: Adding datanodes to Hadoop cluster - Will data redistribute?

Posted by Arpit Agarwal <aa...@hortonworks.com>.

Hi Manoj,

Existing data is not automatically redistributed when you add new DataNodes. Take a look at the 'hdfs balancer' command which can be run as a separate administrative tool to rebalance data distribution across DataNodes.

From: Manoj Venkatesh <ma...@gmail.com>>
Reply-To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Date: Friday, February 6, 2015 at 11:34 AM
To: "user@hadoop.apache.org<ma...@hadoop.apache.org>" <us...@hadoop.apache.org>>
Subject: Adding datanodes to Hadoop cluster - Will data redistribute?

Dear Hadoop experts,

I have a Hadoop cluster of 8 nodes, 6 were added during cluster creation and 2 additional nodes were added later to increase disk and CPU capacity. What i see is that processing is shared amongst all the nodes whereas the storage is reaching capacity on the original 6 nodes whereas the newly added machines have relatively large amount of storage still unoccupied.

I was wondering if there is an automated or any way of redistributing data so that all the nodes are equally utilized. I have checked for the configuration parameter - dfs.datanode.fsdataset.volume.choosing.policy have options 'Round Robin' or 'Available Space', are there any other configurations which need to be reviewed.

Thanks,
Manoj