You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ozone.apache.org by GitBox <gi...@apache.org> on 2021/06/01 20:32:57 UTC

[GitHub] [ozone] sodonnel commented on pull request #2286: HDDS-5277. Recon shows operational status as "DECOMMISSIONING" for "DECOMMISSIONED" DNs

sodonnel commented on pull request #2286:
URL: https://github.com/apache/ozone/pull/2286#issuecomment-852426454


   I think there is another scenario this change won't catch.
   
   1. Node is put to maintenance with a timeout value.
   2. Node goes to IN_MAINTENANCE and is then stopped and goes dead.
   3. Node does not come back in time, and SCM automatically expires the maintenance, moving the host to IN_SERVICE + DEAD.
   
   In this case the dead node handler would fire once on Recon, and populate the dead nodes array, then the state would be checked and the array cleared. The later transition to IN_SERVICE + DEAD would never get seen by Recon.
   
   I wonder if it would be simpler to forget about the dead node handler on Recon, and just sync up all nodes on SCM with the nodes on Recon on each run (one every 5 minutes I think). Even with a few 1000 nodes, this should not have a big overhead.
   
   In saying that, using the dead node handler is a nice optimisation.
   
   We could ask SCM for a list of nodes which are only in the DEAD state and sync up that group. There is an existing API for that in SCMNodeManager:
   
   ```
    /**
      * Returns all datanode that are in the given states. Passing null for one of
      * of the states acts like a wildcard for that state. This function works by
      * taking a snapshot of the current collection and then returning the list
      * from that collection. This means that real map might have changed by the
      * time we return this list.
      *
      * @param opState The operational state of the node
      * @param health The health of the node
      * @return List of Datanodes that are known to SCM in the requested states.
      */
     @Override
     public List<DatanodeDetails> getNodes(
         NodeOperationalState opState, NodeState health) {
       return nodeStateManager.getNodes(opState, health)
           .stream()
           .map(node -> (DatanodeDetails)node).collect(Collectors.toList());
     }
   ```
   
   I think that would cover both scenarios.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@ozone.apache.org
For additional commands, e-mail: issues-help@ozone.apache.org