You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Arun Ramakrishnan <ar...@languageweaver.com> on 2010/07/08 03:18:56 UTC

rebalancing replication help

Looks like there is not much activity in the hdfs-user list. So, am reposting it in the general list.

Hi guys.
  I have a few related questions. I am going to layout the steps I have taken. Please comment on what I can do better.

  I was trying to to add 5 nodes to my existing 10 node cluster and also increase the replication factor from 2 to 3.
I thought I don't have to run the balancer cause it would most likely put the new replicas into the new nodes.

There are about 500k blocks.
I wanted to get it all stabilized(replication and balancing) within 24 hours. Its more than 24 hours now and fsck reports 30% under replication. Is there a way to force hdfs to use balance/replicate more aggressively.

It would be great if someone explained what/when things happen to blocks in the context of

1)      Rebalancing

2)      -setrep

3)      Restarting cluster with a higher/lower replication factor.

A few questions and a few issues here.

1)      When you restart the cluster with a higher than previous replication value. Does it also apply to existing blocks or only to new blocks being created ?

2)      Does the balancer take into account under replication of blocks or does it blindly start moving existing blocks to reach threshold ?


A very specific problem .  I am having this strange problem where the -setrep hangs on one particular block for hours. Is this because its corrupt ?. But, fsck said its healthy.


Thanks
Arun

RE: rebalancing replication help

Posted by Arun Ramakrishnan <ar...@languageweaver.com>.

Thanks Edward.

The setting to configure how aggressively replication is performed is dfs.balance.bandwidthPerSec ? I know it's used by the balancer. But, I can't find any other parameter that makes sense in this context.

Thanks
Arun

-----Original Message-----
From: Edward Capriolo [mailto:edlinuxguru@gmail.com] 
Sent: Wednesday, July 07, 2010 6:58 PM
To: common-user@hadoop.apache.org
Cc: general@hadoop.apache.org
Subject: Re: rebalancing replication help

On Wed, Jul 7, 2010 at 9:18 PM, Arun Ramakrishnan
<ar...@languageweaver.com> wrote:
> Looks like there is not much activity in the hdfs-user list. So, am reposting it in the general list.
>
> Hi guys.
>  I have a few related questions. I am going to layout the steps I have taken. Please comment on what I can do better.
>
>  I was trying to to add 5 nodes to my existing 10 node cluster and also increase the replication factor from 2 to 3.
> I thought I don't have to run the balancer cause it would most likely put the new replicas into the new nodes.
>
> There are about 500k blocks.
> I wanted to get it all stabilized(replication and balancing) within 24 hours. Its more than 24 hours now and fsck reports 30% under replication. Is there a way to force hdfs to use balance/replicate more aggressively.
>
> It would be great if someone explained what/when things happen to blocks in the context of
>
> 1)      Rebalancing
>
> 2)      -setrep
>
> 3)      Restarting cluster with a higher/lower replication factor.
>
> A few questions and a few issues here.
>
> 1)      When you restart the cluster with a higher than previous replication value. Does it also apply to existing blocks or only to new blocks being created ?
>
> 2)      Does the balancer take into account under replication of blocks or does it blindly start moving existing blocks to reach threshold ?
>
>
> A very specific problem .  I am having this strange problem where the -setrep hangs on one particular block for hours. Is this because its corrupt ?. But, fsck said its healthy.
>
>
> Thanks
> Arun
>
>

> 2)      -setrep
This will change the replication factor of an existing file (in the
background it should start replicating)

> 2) Does the balancer take into account under replication of blocks or does it blindly start moving existing blocks to reach threshold ?

Files most under replication should be prioritized.

> 3)      Restarting cluster with a higher/lower replication factor.
This only affects new files that are created. Where the client has not
specified a value

> A very specific problem .  I am having this strange problem where the -setrep hangs on one particular block for hours. Is this because its corrupt ?. But, fsck said its healthy.
Not sure

> Its more than 24 hours now and fsck reports 30% under
There is a configuration setting for maximum replication bandwidth.
You might have to tune that.

Re: rebalancing replication help

Posted by Edward Capriolo <ed...@gmail.com>.

On Wed, Jul 7, 2010 at 9:18 PM, Arun Ramakrishnan
<ar...@languageweaver.com> wrote:
> Looks like there is not much activity in the hdfs-user list. So, am reposting it in the general list.
>
> Hi guys.
>  I have a few related questions. I am going to layout the steps I have taken. Please comment on what I can do better.
>
>  I was trying to to add 5 nodes to my existing 10 node cluster and also increase the replication factor from 2 to 3.
> I thought I don't have to run the balancer cause it would most likely put the new replicas into the new nodes.
>
> There are about 500k blocks.
> I wanted to get it all stabilized(replication and balancing) within 24 hours. Its more than 24 hours now and fsck reports 30% under replication. Is there a way to force hdfs to use balance/replicate more aggressively.
>
> It would be great if someone explained what/when things happen to blocks in the context of
>
> 1)      Rebalancing
>
> 2)      -setrep
>
> 3)      Restarting cluster with a higher/lower replication factor.
>
> A few questions and a few issues here.
>
> 1)      When you restart the cluster with a higher than previous replication value. Does it also apply to existing blocks or only to new blocks being created ?
>
> 2)      Does the balancer take into account under replication of blocks or does it blindly start moving existing blocks to reach threshold ?
>
>
> A very specific problem .  I am having this strange problem where the -setrep hangs on one particular block for hours. Is this because its corrupt ?. But, fsck said its healthy.
>
>
> Thanks
> Arun
>
>

> 2)      -setrep
This will change the replication factor of an existing file (in the
background it should start replicating)

> 2) Does the balancer take into account under replication of blocks or does it blindly start moving existing blocks to reach threshold ?

Files most under replication should be prioritized.

> 3)      Restarting cluster with a higher/lower replication factor.
This only affects new files that are created. Where the client has not
specified a value

> A very specific problem .  I am having this strange problem where the -setrep hangs on one particular block for hours. Is this because its corrupt ?. But, fsck said its healthy.
Not sure

> Its more than 24 hours now and fsck reports 30% under
There is a configuration setting for maximum replication bandwidth.
You might have to tune that.