You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "liuyiyang (JIRA)" <ji...@apache.org> on 2017/07/12 09:21:00 UTC

[jira] [Created] (HDFS-12128) Namenode failover may make balancer's efforts be in vain

liuyiyang created HDFS-12128:
--------------------------------

             Summary: Namenode failover may make balancer's efforts be in vain
                 Key: HDFS-12128
                 URL: https://issues.apache.org/jira/browse/HDFS-12128
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: balancer & mover
    Affects Versions: 2.6.0
            Reporter: liuyiyang


The problem can be reproduced as follows:
1.In an HA cluster with imbalance datanode usage, we run "start-balancer.sh" to make the cluster balanced;
2.Before starting balancer, trigger failover of namenodes, this will make all datanodes be marked as stale by active namenode;
3.Start balancer to make the datanode usage balanced;
4.As balancer is running, under-utilized datanodes' usage will increase, but over-utilized datanodes' usage will stay unchanged for long time.

Since all datanodes are marked as stale, deletion will be postponed in stale datanodes. During balancing, the replicas in source datanodes can't be deleted immediately,
so the total usage of the cluster will increase and won't decrease until datanodes' stale state be cancelled.
When the datanodes send next block report to namenode(default interval is 6h), active namenode will cancel the stale state of datanodes. I found if replicas on source datanodes can't be deleted immediately in OP_REPLACE operation via del_hint to namenode,
namenode will schedule replicas on datanodes with least remaining space to delete instead of replicas on source datanodes. Unfortunately, datanodes with least remaining space may be the target datanodes when balancing, which will lead to imbalanced datanode usage again.
If balancer finishes before next block report, all postponed over-replicated replicas will be deleted based on remaining space of datanodes, this may lead to furitless balancer efforts.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org