You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by GitBox <gi...@apache.org> on 2020/12/18 12:14:52 UTC

[GitHub] [hadoop] sodonnel commented on pull request #2562: HDFS-15737. Don't remove datanodes from outOfServiceNodeBlocks while checking in DatanodeAdminManager

sodonnel commented on pull request #2562:
URL: https://github.com/apache/hadoop/pull/2562#issuecomment-748053842


   It looks like this same logic also exists in trunk - could you submit a trunk PR / patch and then we can backport the change across all active branches?
   
   I am also a little confused about this problem. The map `outOfServiceNodeBlocks` is modified in a few places in the middle of the Cyclic Iteration. If it threw an ConcurrentModificationException on modification, then I would expect us to be seeing this a lot. Probably anytime there is more than 1 node added to decommission / maintenance.
   
   Eg, from trunk DatanodeAdminDefaultMonitor.java, here `it` is a CyclicIterator over `outOfServiceNodeBlocks`
   
   ```
       while (it.hasNext() && !exceededNumBlocksPerCheck() && namesystem
           .isRunning()) {
         numNodesChecked++;
         final Map.Entry<DatanodeDescriptor, AbstractList<BlockInfo>>
             entry = it.next();
         final DatanodeDescriptor dn = entry.getKey();
         try {
           AbstractList<BlockInfo> blocks = entry.getValue();
           boolean fullScan = false;
           if (dn.isMaintenance() && dn.maintenanceExpired()) {
             // If maintenance expires, stop tracking it.
             dnAdmin.stopMaintenance(dn);
             toRemove.add(dn);
             continue;
           }
           if (dn.isInMaintenance()) {
             // The dn is IN_MAINTENANCE and the maintenance hasn't expired yet.
             continue;
           }
           if (blocks == null) {
             // This is a newly added datanode, run through its list to schedule
             // under-replicated blocks for replication and collect the blocks
             // that are insufficiently replicated for further tracking
             LOG.debug("Newly-added node {}, doing full scan to find " +
                 "insufficiently-replicated blocks.", dn);
             blocks = handleInsufficientlyStored(dn);
             outOfServiceNodeBlocks.put(dn, blocks);  // **** Modifies outOfServiceNodeBlocks
            ...
   ```
   
   Note that outOfServiceNodeBlocks is modified on the first pass, and so `it.next()` should throw an exception on the next iteration.
   
   Have you seen the ConcurrentModificationException logged due to this problem?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org