You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-dev@hadoop.apache.org by "Andrew Wang (JIRA)" <ji...@apache.org> on 2014/07/29 01:05:39 UTC

[jira] [Resolved] (HDFS-6626) Node is marked decommissioned if it becomes dead when it is being decommissioned

     [ https://issues.apache.org/jira/browse/HDFS-6626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Wang resolved HDFS-6626.
-------------------------------

    Resolution: Not a Problem

Thanks for checking in on this Ming. Since it seems like the dead node list is sufficient, let's close this JIRA out. Please reopen if a usecase reemerges.

> Node is marked decommissioned if it becomes dead when it is being decommissioned
> --------------------------------------------------------------------------------
>
>                 Key: HDFS-6626
>                 URL: https://issues.apache.org/jira/browse/HDFS-6626
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ming Ma
>
> Not sure if it is by design. But it isn't intuitive. The scenario is like this, you try to decommission a node; when the node is being decommissioned, the node becomes dead from NN's point of view; right after that NN will mark this node decommissioned. On the webUI, administrators will consider the decommission has completed successfully. That is because when there is no block left for the DN, decommission is considered done.
> {noformat}
> BlockManager.java
>   boolean isReplicationInProgress(DatanodeDescriptor srcNode) {
>     boolean status = false;
> ...
>     final Iterator<? extends Block> it = srcNode.getBlockIterator();
>     while(it.hasNext()) {
> ...
> // set status if there is block under replication
>     }
> ...
>     return status;
> }
> {noformat}
> The question is whether we should mark the dead node as decommission completed (the current behavior), or mark the dead node "decommission aborted". From administrators' point of view, when they are doing decomm, they want to know the status of decomm and the health of those decomm-in-progress nodes. If they can detect decommission failure earlier, they might be able to take actions earlier; for example if the TOR switch has issues during decomm, administrators will be able to quickly find out a bunch of "decommission aborted" nodes from the same rack. People can still find this information by doing the join between decomm node list and recent dead node list on the webUI; just not as convenient.
> Suggestions?



--
This message was sent by Atlassian JIRA
(v6.2#6252)