You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "sodonnel (via GitHub)" <gi...@apache.org> on 2023/05/11 21:05:49 UTC

[GitHub] [hadoop] sodonnel commented on pull request #5643: HDFS-17003. Erasure coding: invalidate wrong block after reporting bad blocks from datanode

sodonnel commented on PR #5643:
URL: https://github.com/apache/hadoop/pull/5643#issuecomment-1544680459

   If I understand correctly, for a replicated block, if there are two corrupt block the code in InvalidateCorruptReplicas will be called when the block has been replicated correctly. At that point there will be 3 good replicas and 2 corrupt stored in the corruptReplicas map. The code in the above method will then iterate over those two and "invalidate" them on the datanodes they are stored in.
   
   For EC, the same applies, however we are sending the blockID + index of the last reported replica to both DNs. All we store in the corruptReplica map, is the block group ID (ie the block ID with the replica index stripped out) and then the list of nodes hosting it. At this point in the code we don't know what index is on each of the nodes hosting a corrupt replica. Is this correct?
   
   Its not clear to me how the fix in this PR fixes the problem:
   
   ```
           if (blk.isStriped()) {
             DatanodeStorageInfo[] storages = getStorages(blk);
             for (DatanodeStorageInfo storage : storages) {
               final Block b = getBlockOnStorage(blk, storage);
               if (b != null) {
                 reported = b;
               }
             }
           }
   ```
   For each node stored, we get the storages for the block, which will be the nodes hosting it. Then we getStorageOnBlock and it is sure to return non-null for each of the storages, as they all host a block in the group, right?
   
   Do we not need to somehow find the replica index for the block for each of the nodes listed, and then setup the "reported block" with the correct blockID + index for that node, passing that to invalidate?
   
   Would something like this work - NOTE - I have not tested this at all:
   
   ```
   --- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
   +++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java
   @@ -3536,9 +3536,25 @@ private void invalidateCorruptReplicas(BlockInfo blk, Block reported,
        // ConcurrentModificationException, when the block is removed from the node
        DatanodeDescriptor[] nodesCopy =
            nodes.toArray(new DatanodeDescriptor[nodes.size()]);
   +
   +    DatanodeStorageInfo[] storages = null;
   +    if (blk.isStriped()) {
   +      storages = getStorages(blk);
   +    }
   +
        for (DatanodeDescriptor node : nodesCopy) {
   +      Block blockToInvalidate = reported;
   +      if (blk.isStriped()) {
   +        for (DatanodeStorageInfo s : storages) {
   +          if (s.getDatanodeDescriptor() == node) {
   +            blockToInvalidate = getBlockOnStorage(blk, s);
   +            break;
   +          }
   +        }
   +      }
   +
          try {
   -        if (!invalidateBlock(new BlockToMarkCorrupt(reported, blk, null,
   +        if (!invalidateBlock(new BlockToMarkCorrupt(blockToInvalidate, blk, null,
                Reason.ANY), node, numberReplicas)) {
              removedFromBlocksMap = false;
            }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org