You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by ch huang <ju...@gmail.com> on 2014/05/06 02:39:35 UTC

issue about cluster balance

hi,maillist:
                 i have a 5-node hadoop cluster,and yesterday i add 5 new
box into my cluster,after that i start balance task,but it move only 7%
data to new node in 20 hour , and i already set
dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the
balance task take long time ?

Re: issue about cluster balance

Posted by ch huang <ju...@gmail.com>.

i record the disk status befor balance and after balance,from one of source
node  and one of destination node

before
source node
/dev/sdd              1.8T 1009G  733G  58% /data/1
/dev/sde              1.8T 1005G  737G  58% /data/2
/dev/sda              1.8T  980G  762G  57% /data/3
/dev/sdb              1.8T  980G  762G  57% /data/4
/dev/sdc              1.8T  972G  769G  56% /data/5
/dev/sdf              1.8T  980G  762G  57% /data/

destination node
/dev/sdb              1.8T  2.0G  1.7T   1% /data/1
/dev/sdc              1.8T  2.1G  1.7T   1% /data/2
/dev/sdd              1.8T  2.0G  1.7T   1% /data/3
/dev/sde              1.8T  2.2G  1.7T   1% /data/4
/dev/sdf              1.8T  2.2G  1.7T   1% /data/5

after
/dev/sdd              1.8T  754G  988G  44% /data/1
/dev/sde              1.8T  736G 1006G  43% /data/2
/dev/sda              1.8T  730G 1011G  42% /data/3
/dev/sdb              1.8T  721G 1020G  42% /data/4
/dev/sdc              1.8T  721G 1021G  42% /data/5
/dev/sdf              1.8T  723G 1019G  42% /data/6

/dev/sdb              1.8T  388G  1.4T  23% /data/1
/dev/sdc              1.8T  381G  1.4T  22% /data/2
/dev/sdd              1.8T  378G  1.4T  22% /data/3
/dev/sde              1.8T  375G  1.4T  22% /data/4
/dev/sdf              1.8T  374G  1.4T  22% /data/5

my wonder is why the source node is not equal destination node ,like 30%
each ?,and the balance took 62.991929444444445 hours

On Tue, May 6, 2014 at 12:38 PM, Rakesh R <ra...@huawei.com> wrote:

>  Could you give more details like,
>
> -          Could you convert 7% to the total amount of moved data in MBs.
>
> -          Also, could you tell me 7% data movement per DN ?
>
> -          What values showing for the ‘over-utilized’, ‘above-average’,
> ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based
> on these values.
>
> -          Please tell me the cluster topology - SAME_NODE_GROUP,
> SAME_RACK. Basically this will matters when choosing the sourceNode vs
> balancerNode pairs as well as the proxy source.
>
> Did you see all the DNs are getting utilized for the block movement.
>
> -          Any exceptions occurred when block movement
>
> -          How many iterations played in these hours
>
>
>
> -Rakesh
>
>
>
> *From:* ch huang [mailto:justlooks@gmail.com]
> *Sent:* 06 May 2014 06:10
> *To:* user@hadoop.apache.org
> *Subject:* issue about cluster balance
>
>
>
> hi,maillist:
>
>                  i have a 5-node hadoop cluster,and yesterday i add 5 new
> box into my cluster,after that i start balance task,but it move only 7%
> data to new node in 20 hour , and i already set
> dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the
> balance task take long time ?
>

Re: issue about cluster balance

Posted by ch huang <ju...@gmail.com>.

i record the disk status befor balance and after balance,from one of source
node  and one of destination node

before
source node
/dev/sdd              1.8T 1009G  733G  58% /data/1
/dev/sde              1.8T 1005G  737G  58% /data/2
/dev/sda              1.8T  980G  762G  57% /data/3
/dev/sdb              1.8T  980G  762G  57% /data/4
/dev/sdc              1.8T  972G  769G  56% /data/5
/dev/sdf              1.8T  980G  762G  57% /data/

destination node
/dev/sdb              1.8T  2.0G  1.7T   1% /data/1
/dev/sdc              1.8T  2.1G  1.7T   1% /data/2
/dev/sdd              1.8T  2.0G  1.7T   1% /data/3
/dev/sde              1.8T  2.2G  1.7T   1% /data/4
/dev/sdf              1.8T  2.2G  1.7T   1% /data/5

after
/dev/sdd              1.8T  754G  988G  44% /data/1
/dev/sde              1.8T  736G 1006G  43% /data/2
/dev/sda              1.8T  730G 1011G  42% /data/3
/dev/sdb              1.8T  721G 1020G  42% /data/4
/dev/sdc              1.8T  721G 1021G  42% /data/5
/dev/sdf              1.8T  723G 1019G  42% /data/6

/dev/sdb              1.8T  388G  1.4T  23% /data/1
/dev/sdc              1.8T  381G  1.4T  22% /data/2
/dev/sdd              1.8T  378G  1.4T  22% /data/3
/dev/sde              1.8T  375G  1.4T  22% /data/4
/dev/sdf              1.8T  374G  1.4T  22% /data/5

my wonder is why the source node is not equal destination node ,like 30%
each ?,and the balance took 62.991929444444445 hours

On Tue, May 6, 2014 at 12:38 PM, Rakesh R <ra...@huawei.com> wrote:

>  Could you give more details like,
>
> -          Could you convert 7% to the total amount of moved data in MBs.
>
> -          Also, could you tell me 7% data movement per DN ?
>
> -          What values showing for the ‘over-utilized’, ‘above-average’,
> ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based
> on these values.
>
> -          Please tell me the cluster topology - SAME_NODE_GROUP,
> SAME_RACK. Basically this will matters when choosing the sourceNode vs
> balancerNode pairs as well as the proxy source.
>
> Did you see all the DNs are getting utilized for the block movement.
>
> -          Any exceptions occurred when block movement
>
> -          How many iterations played in these hours
>
>
>
> -Rakesh
>
>
>
> *From:* ch huang [mailto:justlooks@gmail.com]
> *Sent:* 06 May 2014 06:10
> *To:* user@hadoop.apache.org
> *Subject:* issue about cluster balance
>
>
>
> hi,maillist:
>
>                  i have a 5-node hadoop cluster,and yesterday i add 5 new
> box into my cluster,after that i start balance task,but it move only 7%
> data to new node in 20 hour , and i already set
> dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the
> balance task take long time ?
>

Re: issue about cluster balance

Posted by ch huang <ju...@gmail.com>.

i record the disk status befor balance and after balance,from one of source
node  and one of destination node

before
source node
/dev/sdd              1.8T 1009G  733G  58% /data/1
/dev/sde              1.8T 1005G  737G  58% /data/2
/dev/sda              1.8T  980G  762G  57% /data/3
/dev/sdb              1.8T  980G  762G  57% /data/4
/dev/sdc              1.8T  972G  769G  56% /data/5
/dev/sdf              1.8T  980G  762G  57% /data/

destination node
/dev/sdb              1.8T  2.0G  1.7T   1% /data/1
/dev/sdc              1.8T  2.1G  1.7T   1% /data/2
/dev/sdd              1.8T  2.0G  1.7T   1% /data/3
/dev/sde              1.8T  2.2G  1.7T   1% /data/4
/dev/sdf              1.8T  2.2G  1.7T   1% /data/5

after
/dev/sdd              1.8T  754G  988G  44% /data/1
/dev/sde              1.8T  736G 1006G  43% /data/2
/dev/sda              1.8T  730G 1011G  42% /data/3
/dev/sdb              1.8T  721G 1020G  42% /data/4
/dev/sdc              1.8T  721G 1021G  42% /data/5
/dev/sdf              1.8T  723G 1019G  42% /data/6

/dev/sdb              1.8T  388G  1.4T  23% /data/1
/dev/sdc              1.8T  381G  1.4T  22% /data/2
/dev/sdd              1.8T  378G  1.4T  22% /data/3
/dev/sde              1.8T  375G  1.4T  22% /data/4
/dev/sdf              1.8T  374G  1.4T  22% /data/5

my wonder is why the source node is not equal destination node ,like 30%
each ?,and the balance took 62.991929444444445 hours

On Tue, May 6, 2014 at 12:38 PM, Rakesh R <ra...@huawei.com> wrote:

>  Could you give more details like,
>
> -          Could you convert 7% to the total amount of moved data in MBs.
>
> -          Also, could you tell me 7% data movement per DN ?
>
> -          What values showing for the ‘over-utilized’, ‘above-average’,
> ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based
> on these values.
>
> -          Please tell me the cluster topology - SAME_NODE_GROUP,
> SAME_RACK. Basically this will matters when choosing the sourceNode vs
> balancerNode pairs as well as the proxy source.
>
> Did you see all the DNs are getting utilized for the block movement.
>
> -          Any exceptions occurred when block movement
>
> -          How many iterations played in these hours
>
>
>
> -Rakesh
>
>
>
> *From:* ch huang [mailto:justlooks@gmail.com]
> *Sent:* 06 May 2014 06:10
> *To:* user@hadoop.apache.org
> *Subject:* issue about cluster balance
>
>
>
> hi,maillist:
>
>                  i have a 5-node hadoop cluster,and yesterday i add 5 new
> box into my cluster,after that i start balance task,but it move only 7%
> data to new node in 20 hour , and i already set
> dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the
> balance task take long time ?
>

Re: issue about cluster balance

Posted by ch huang <ju...@gmail.com>.

i record the disk status befor balance and after balance,from one of source
node  and one of destination node

before
source node
/dev/sdd              1.8T 1009G  733G  58% /data/1
/dev/sde              1.8T 1005G  737G  58% /data/2
/dev/sda              1.8T  980G  762G  57% /data/3
/dev/sdb              1.8T  980G  762G  57% /data/4
/dev/sdc              1.8T  972G  769G  56% /data/5
/dev/sdf              1.8T  980G  762G  57% /data/

destination node
/dev/sdb              1.8T  2.0G  1.7T   1% /data/1
/dev/sdc              1.8T  2.1G  1.7T   1% /data/2
/dev/sdd              1.8T  2.0G  1.7T   1% /data/3
/dev/sde              1.8T  2.2G  1.7T   1% /data/4
/dev/sdf              1.8T  2.2G  1.7T   1% /data/5

after
/dev/sdd              1.8T  754G  988G  44% /data/1
/dev/sde              1.8T  736G 1006G  43% /data/2
/dev/sda              1.8T  730G 1011G  42% /data/3
/dev/sdb              1.8T  721G 1020G  42% /data/4
/dev/sdc              1.8T  721G 1021G  42% /data/5
/dev/sdf              1.8T  723G 1019G  42% /data/6

/dev/sdb              1.8T  388G  1.4T  23% /data/1
/dev/sdc              1.8T  381G  1.4T  22% /data/2
/dev/sdd              1.8T  378G  1.4T  22% /data/3
/dev/sde              1.8T  375G  1.4T  22% /data/4
/dev/sdf              1.8T  374G  1.4T  22% /data/5

my wonder is why the source node is not equal destination node ,like 30%
each ?,and the balance took 62.991929444444445 hours

On Tue, May 6, 2014 at 12:38 PM, Rakesh R <ra...@huawei.com> wrote:

>  Could you give more details like,
>
> -          Could you convert 7% to the total amount of moved data in MBs.
>
> -          Also, could you tell me 7% data movement per DN ?
>
> -          What values showing for the ‘over-utilized’, ‘above-average’,
> ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based
> on these values.
>
> -          Please tell me the cluster topology - SAME_NODE_GROUP,
> SAME_RACK. Basically this will matters when choosing the sourceNode vs
> balancerNode pairs as well as the proxy source.
>
> Did you see all the DNs are getting utilized for the block movement.
>
> -          Any exceptions occurred when block movement
>
> -          How many iterations played in these hours
>
>
>
> -Rakesh
>
>
>
> *From:* ch huang [mailto:justlooks@gmail.com]
> *Sent:* 06 May 2014 06:10
> *To:* user@hadoop.apache.org
> *Subject:* issue about cluster balance
>
>
>
> hi,maillist:
>
>                  i have a 5-node hadoop cluster,and yesterday i add 5 new
> box into my cluster,after that i start balance task,but it move only 7%
> data to new node in 20 hour , and i already set
> dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the
> balance task take long time ?
>

RE: issue about cluster balance

Posted by Rakesh R <ra...@huawei.com>.

Could you give more details like,

-          Could you convert 7% to the total amount of moved data in MBs.

-          Also, could you tell me 7% data movement per DN ?

-          What values showing for the ‘over-utilized’, ‘above-average’, ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based on these values.

-          Please tell me the cluster topology - SAME_NODE_GROUP, SAME_RACK. Basically this will matters when choosing the sourceNode vs balancerNode pairs as well as the proxy source.

Did you see all the DNs are getting utilized for the block movement.

-          Any exceptions occurred when block movement

-          How many iterations played in these hours

-Rakesh

From: ch huang [mailto:justlooks@gmail.com]
Sent: 06 May 2014 06:10
To: user@hadoop.apache.org
Subject: issue about cluster balance

hi,maillist:
                 i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?

RE: issue about cluster balance

Posted by Rakesh R <ra...@huawei.com>.

Could you give more details like,

-          Could you convert 7% to the total amount of moved data in MBs.

-          Also, could you tell me 7% data movement per DN ?

-          What values showing for the ‘over-utilized’, ‘above-average’, ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based on these values.

-          Please tell me the cluster topology - SAME_NODE_GROUP, SAME_RACK. Basically this will matters when choosing the sourceNode vs balancerNode pairs as well as the proxy source.

Did you see all the DNs are getting utilized for the block movement.

-          Any exceptions occurred when block movement

-          How many iterations played in these hours

-Rakesh

From: ch huang [mailto:justlooks@gmail.com]
Sent: 06 May 2014 06:10
To: user@hadoop.apache.org
Subject: issue about cluster balance

hi,maillist:
                 i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?

RE: issue about cluster balance

Posted by Rakesh R <ra...@huawei.com>.

Could you give more details like,

-          Could you convert 7% to the total amount of moved data in MBs.

-          Also, could you tell me 7% data movement per DN ?

-          What values showing for the ‘over-utilized’, ‘above-average’, ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based on these values.

-          Please tell me the cluster topology - SAME_NODE_GROUP, SAME_RACK. Basically this will matters when choosing the sourceNode vs balancerNode pairs as well as the proxy source.

Did you see all the DNs are getting utilized for the block movement.

-          Any exceptions occurred when block movement

-          How many iterations played in these hours

-Rakesh

From: ch huang [mailto:justlooks@gmail.com]
Sent: 06 May 2014 06:10
To: user@hadoop.apache.org
Subject: issue about cluster balance

hi,maillist:
                 i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?

RE: issue about cluster balance

Posted by Rakesh R <ra...@huawei.com>.

Could you give more details like,

-          Could you convert 7% to the total amount of moved data in MBs.

-          Also, could you tell me 7% data movement per DN ?

-          What values showing for the ‘over-utilized’, ‘above-average’, ‘below-average’, ‘below-average’ nodes. Balancer will do the pairing based on these values.

-          Please tell me the cluster topology - SAME_NODE_GROUP, SAME_RACK. Basically this will matters when choosing the sourceNode vs balancerNode pairs as well as the proxy source.

Did you see all the DNs are getting utilized for the block movement.

-          Any exceptions occurred when block movement

-          How many iterations played in these hours

-Rakesh

From: ch huang [mailto:justlooks@gmail.com]
Sent: 06 May 2014 06:10
To: user@hadoop.apache.org
Subject: issue about cluster balance

hi,maillist:
                 i have a 5-node hadoop cluster,and yesterday i add 5 new box into my cluster,after that i start balance task,but it move only 7% data to new node in 20 hour , and i already set dfs.datanode.balance.bandwidthPerSec 10M ,and the threshold is 10%,why the balance task take long time ?