You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-issues@hadoop.apache.org by "Kevin Wikant (Jira)" <ji...@apache.org> on 2022/07/16 16:16:00 UTC
[jira] [Updated] (HDFS-16664) Use correct GenerationStamp when invalidating corrupt block replica

     [ https://issues.apache.org/jira/browse/HDFS-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kevin Wikant updated HDFS-16664:
--------------------------------
    Description: 
While trying to backport HDFS-16064 to an older Hadoop version, the new unit test "testDeleteCorruptReplicaForUnderReplicatedBlock" started failing unexpectedly.

Upon deep diving this unit test failure, I identified a bug in HDFS corrupt replica invalidation which results in the following datanode exception:
{quote}2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN  datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
java.io.IOException: Failed to delete 1 (out of 1) replica(s):
0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
        at java.lang.Thread.run(Thread.java:750)
{quote}
The issue is that the Namenode is sending wrong generationStamp to the datanode. By adding some additional logs, I was able to determine the root cause for this:
 * the generationStamp sent in the DNA_INVALIDATE is based on the [generationStamp of the block sent in the block report|https://github.com/apache/hadoop/blob/8774f178686487007dcf8c418c989b785a529000/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3733]
 * the problem is that the datanode with the corrupt block replica (that receives the DNA_INVALIDATE) is not necissarily the same datanode that sent the block report
 * this can cause the above exception when the corrupt block replica on the datanode receiving the DNA_INVALIDATE & the block replica on the datanode that sent the block report have different generationStamps

The solution is to store the corrupt replicas generationStamp in the CorruptReplicasMap, then to extract this correct generationStamp value when sending the DNA_INVALIDATE to the datanode

 
h2. Failed Test - Before the fix
{quote}> mvn test -Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock

 

[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestDecommission.testDeleteCorruptReplicaForUnderReplicatedBlock:2035 Node 127.0.0.1:61366 failed to complete decommissioning. numTrackedNodes=1 , numPendingNodes=0 , adminState=Decommission In Progress , nodesWithReplica=[127.0.0.1:61366, 127.0.0.1:61419]
{quote}
Logs:
{quote}> cat target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt | grep 'Expected Replicas:|XXX|FINALIZED|Block now|Failed to delete'

2022-07-16 08:07:45,891 [Listener at localhost/61378] INFO  hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 live replica on 127.0.0.1:61366
2022-07-16 08:07:45,913 [Listener at localhost/61378] INFO  hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 decommissioning replica on 127.0.0.1:61366
XXX invalidateBlock dn=127.0.0.1:61415 , blk=1073741825_1001
XXX postponeBlock dn=127.0.0.1:61415 , blk=1073741825_1001
XXX invalidateBlock dn=127.0.0.1:61419 , blk=1073741825_1003
XXX addToInvalidates dn=127.0.0.1:61419 , blk=1073741825_1003
XXX addBlocksToBeInvalidated dn=127.0.0.1:61419 , blk=1073741825_1003
XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005
XXX DNA_INVALIDATE dn=/127.0.0.1:61419 , blk=1073741825_1003
XXX invalidate(on DN) dn=/127.0.0.1:61419 , invalidBlk=blk_1073741825_1003 , blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED
2022-07-16 08:07:49,084 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED
XXX addBlock dn=127.0.0.1:61419 , blk=1073741825_1005   <<<  block report is coming from 127.0.0.1:61419 which has genStamp=1005
XXX invalidateCorruptReplicas dn=127.0.0.1:61415 , reported_blk=1073741825_1005   <<<  corrupt replica is on 127.0.0.1:61415 which is expecting genStamp=1001
XXX addToInvalidates dn=127.0.0.1:61415 , blk=1073741825_1005
2022-07-16 08:07:49,431 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
XXX addBlocksToBeInvalidated dn=127.0.0.1:61415 , blk=1073741825_1005   <<<  Namenode sends wrong genStamp to 127.0.0.1:61415
XXX DNA_INVALIDATE dn=/127.0.0.1:61415 , blk=1073741825_1005
XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkByIdAndGenStamp = null
XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkById = FinalizedReplica, blk_1073741825_1001, FINALIZED
2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN  datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
java.io.IOException: Failed to delete 1 (out of 1) replica(s):
0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
        at java.lang.Thread.run(Thread.java:750)
2022-07-16 08:07:52,384 [DataXceiver for client  at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO  datanode.DataNode (DataXceiver.java:writeBlock(939)) - opWriteBlock BP-958471676-X-1657973243350:blk_1073741825_1005 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created.
2022-07-16 08:07:52,385 [DataXceiver for client  at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO  datanode.DataNode (DataXceiver.java:run(307)) - 127.0.0.1:61415:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:61434 dst: /127.0.0.1:61415; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created.
2022-07-16 08:07:54,422 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
...
2022-07-16 08:08:24,426 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
{quote}
Note the inline comments above which illustrate the bug

 

  was:
While trying to backport HDFS-16064 to an older Hadoop version, the new unit test "testDeleteCorruptReplicaForUnderReplicatedBlock" started failing unexpectedly.

Upon deep diving this unit test failure, I identified a bug in HDFS corrupt replica invalidation which results in the following datanode exception:
{quote}2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN  datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
java.io.IOException: Failed to delete 1 (out of 1) replica(s):
0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
        at java.lang.Thread.run(Thread.java:750)
{quote} * The issue is that the Namenode is sending wrong generationStamp to the datanode. By adding some additional logs, I was able to determine the root cause for this:
the generationStamp sent in the DNA_INVALIDATE is based on the [generationStamp of the block sent in the block report|https://github.com/apache/hadoop/blob/8774f178686487007dcf8c418c989b785a529000/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3733]
 * the problem is that the datanode with the corrupt block replica (that receives the DNA_INVALIDATE) is not necissarily the same datanode that sent the block report
 * this can cause the above exception when the corrupt block replica on the datanode receiving the DNA_INVALIDATE & the block replica on the datanode that sent the block report have different generationStamps

The solution is to store the corrupt replicas generationStamp in the CorruptReplicasMap, then to extract this correct generationStamp value when sending the DNA_INVALIDATE to the datanode

 
h2. Failed Test - Before the fix
{quote}> mvn test -Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock

 

[INFO] Results:
[INFO] 
[ERROR] Failures: 
[ERROR]   TestDecommission.testDeleteCorruptReplicaForUnderReplicatedBlock:2035 Node 127.0.0.1:61366 failed to complete decommissioning. numTrackedNodes=1 , numPendingNodes=0 , adminState=Decommission In Progress , nodesWithReplica=[127.0.0.1:61366, 127.0.0.1:61419]
{quote}
Logs:
{quote}> cat target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt | grep 'Expected Replicas:\|XXX\|FINALIZED\|Block now\|Failed to delete'


2022-07-16 08:07:45,891 [Listener at localhost/61378] INFO  hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 live replica on 127.0.0.1:61366
2022-07-16 08:07:45,913 [Listener at localhost/61378] INFO  hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 decommissioning replica on 127.0.0.1:61366
XXX invalidateBlock dn=127.0.0.1:61415 , blk=1073741825_1001
XXX postponeBlock dn=127.0.0.1:61415 , blk=1073741825_1001
XXX invalidateBlock dn=127.0.0.1:61419 , blk=1073741825_1003
XXX addToInvalidates dn=127.0.0.1:61419 , blk=1073741825_1003
XXX addBlocksToBeInvalidated dn=127.0.0.1:61419 , blk=1073741825_1003
XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005
XXX DNA_INVALIDATE dn=/127.0.0.1:61419 , blk=1073741825_1003
XXX invalidate(on DN) dn=/127.0.0.1:61419 , invalidBlk=blk_1073741825_1003 , blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED
2022-07-16 08:07:49,084 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED
XXX addBlock dn=127.0.0.1:61419 , blk=1073741825_1005   <<<  block report is coming from 127.0.0.1:61419 which has genStamp=1005
XXX invalidateCorruptReplicas dn=127.0.0.1:61415 , reported_blk=1073741825_1005   <<<  corrupt replica is on 127.0.0.1:61415 which is expecting genStamp=1001
XXX addToInvalidates dn=127.0.0.1:61415 , blk=1073741825_1005
2022-07-16 08:07:49,431 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
XXX addBlocksToBeInvalidated dn=127.0.0.1:61415 , blk=1073741825_1005   <<<  Namenode sends wrong genStamp to 127.0.0.1:61415
XXX DNA_INVALIDATE dn=/127.0.0.1:61415 , blk=1073741825_1005
XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkByIdAndGenStamp = null
XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkById = FinalizedReplica, blk_1073741825_1001, FINALIZED
2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN  datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
java.io.IOException: Failed to delete 1 (out of 1) replica(s):
0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
        at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
        at java.lang.Thread.run(Thread.java:750)
2022-07-16 08:07:52,384 [DataXceiver for client  at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO  datanode.DataNode (DataXceiver.java:writeBlock(939)) - opWriteBlock BP-958471676-X-1657973243350:blk_1073741825_1005 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created.
2022-07-16 08:07:52,385 [DataXceiver for client  at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO  datanode.DataNode (DataXceiver.java:run(307)) - 127.0.0.1:61415:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:61434 dst: /127.0.0.1:61415; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created.
2022-07-16 08:07:54,422 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
...
2022-07-16 08:08:24,426 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
{quote}
Note the inline comments above which illustrate the bug

 


> Use correct GenerationStamp when invalidating corrupt block replica
> -------------------------------------------------------------------
>
>                 Key: HDFS-16664
>                 URL: https://issues.apache.org/jira/browse/HDFS-16664
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kevin Wikant
>            Priority: Major
>
> While trying to backport HDFS-16064 to an older Hadoop version, the new unit test "testDeleteCorruptReplicaForUnderReplicatedBlock" started failing unexpectedly.
> Upon deep diving this unit test failure, I identified a bug in HDFS corrupt replica invalidation which results in the following datanode exception:
> {quote}2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN  datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
> java.io.IOException: Failed to delete 1 (out of 1) replica(s):
> 0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
>         at java.lang.Thread.run(Thread.java:750)
> {quote}
> The issue is that the Namenode is sending wrong generationStamp to the datanode. By adding some additional logs, I was able to determine the root cause for this:
>  * the generationStamp sent in the DNA_INVALIDATE is based on the [generationStamp of the block sent in the block report|https://github.com/apache/hadoop/blob/8774f178686487007dcf8c418c989b785a529000/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L3733]
>  * the problem is that the datanode with the corrupt block replica (that receives the DNA_INVALIDATE) is not necissarily the same datanode that sent the block report
>  * this can cause the above exception when the corrupt block replica on the datanode receiving the DNA_INVALIDATE & the block replica on the datanode that sent the block report have different generationStamps
> The solution is to store the corrupt replicas generationStamp in the CorruptReplicasMap, then to extract this correct generationStamp value when sending the DNA_INVALIDATE to the datanode
>  
> h2. Failed Test - Before the fix
> {quote}> mvn test -Dtest=TestDecommission#testDeleteCorruptReplicaForUnderReplicatedBlock
>  
> [INFO] Results:
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestDecommission.testDeleteCorruptReplicaForUnderReplicatedBlock:2035 Node 127.0.0.1:61366 failed to complete decommissioning. numTrackedNodes=1 , numPendingNodes=0 , adminState=Decommission In Progress , nodesWithReplica=[127.0.0.1:61366, 127.0.0.1:61419]
> {quote}
> Logs:
> {quote}> cat target/surefire-reports/org.apache.hadoop.hdfs.TestDecommission-output.txt | grep 'Expected Replicas:|XXX|FINALIZED|Block now|Failed to delete'
> 2022-07-16 08:07:45,891 [Listener at localhost/61378] INFO  hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1942)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 live replica on 127.0.0.1:61366
> 2022-07-16 08:07:45,913 [Listener at localhost/61378] INFO  hdfs.TestDecommission (TestDecommission.java:testDeleteCorruptReplicaForUnderReplicatedBlock(1974)) - Block now has 2 corrupt replicas on [127.0.0.1:61370 , 127.0.0.1:61375] and 1 decommissioning replica on 127.0.0.1:61366
> XXX invalidateBlock dn=127.0.0.1:61415 , blk=1073741825_1001
> XXX postponeBlock dn=127.0.0.1:61415 , blk=1073741825_1001
> XXX invalidateBlock dn=127.0.0.1:61419 , blk=1073741825_1003
> XXX addToInvalidates dn=127.0.0.1:61419 , blk=1073741825_1003
> XXX addBlocksToBeInvalidated dn=127.0.0.1:61419 , blk=1073741825_1003
> XXX rescanPostponedMisreplicatedBlocks blk=1073741825_1005
> XXX DNA_INVALIDATE dn=/127.0.0.1:61419 , blk=1073741825_1003
> XXX invalidate(on DN) dn=/127.0.0.1:61419 , invalidBlk=blk_1073741825_1003 , blkByIdAndGenStamp = FinalizedReplica, blk_1073741825_1003, FINALIZED
> 2022-07-16 08:07:49,084 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] INFO  impl.FsDatasetAsyncDiskService (FsDatasetAsyncDiskService.java:deleteAsync(226)) - Scheduling blk_1073741825_1003 replica FinalizedReplica, blk_1073741825_1003, FINALIZED
> XXX addBlock dn=127.0.0.1:61419 , blk=1073741825_1005   <<<  block report is coming from 127.0.0.1:61419 which has genStamp=1005
> XXX invalidateCorruptReplicas dn=127.0.0.1:61415 , reported_blk=1073741825_1005   <<<  corrupt replica is on 127.0.0.1:61415 which is expecting genStamp=1001
> XXX addToInvalidates dn=127.0.0.1:61415 , blk=1073741825_1005
> 2022-07-16 08:07:49,431 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
> XXX addBlocksToBeInvalidated dn=127.0.0.1:61415 , blk=1073741825_1005   <<<  Namenode sends wrong genStamp to 127.0.0.1:61415
> XXX DNA_INVALIDATE dn=/127.0.0.1:61415 , blk=1073741825_1005
> XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkByIdAndGenStamp = null
> XXX invalidate(on DN) dn=/127.0.0.1:61415 , invalidBlk=blk_1073741825_1005 , blkById = FinalizedReplica, blk_1073741825_1001, FINALIZED
> 2022-07-16 08:07:52,041 [BP-958471676-X-1657973243350 heartbeating to localhost/127.0.0.1:61365] WARN  datanode.DataNode (BPServiceActor.java:processCommand(887)) - Error processing datanode Command
> java.io.IOException: Failed to delete 1 (out of 1) replica(s):
> 0) Failed to delete replica blk_1073741825_1005: GenerationStamp not matched, existing replica is blk_1073741825_1001
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2139)
>         at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.invalidate(FsDatasetImpl.java:2034)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActive(BPOfferService.java:735)
>         at org.apache.hadoop.hdfs.server.datanode.BPOfferService.processCommandFromActor(BPOfferService.java:680)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processCommand(BPServiceActor.java:883)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:678)
>         at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:849)
>         at java.lang.Thread.run(Thread.java:750)
> 2022-07-16 08:07:52,384 [DataXceiver for client  at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO  datanode.DataNode (DataXceiver.java:writeBlock(939)) - opWriteBlock BP-958471676-X-1657973243350:blk_1073741825_1005 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created.
> 2022-07-16 08:07:52,385 [DataXceiver for client  at /127.0.0.1:61434 [Receiving block BP-958471676-X-1657973243350:blk_1073741825_1005]] INFO  datanode.DataNode (DataXceiver.java:run(307)) - 127.0.0.1:61415:DataXceiver error processing WRITE_BLOCK operation  src: /127.0.0.1:61434 dst: /127.0.0.1:61415; org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-958471676-X-1657973243350:blk_1073741825_1005 already exists in state FINALIZED and thus cannot be created.
> 2022-07-16 08:07:54,422 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
> ...
> 2022-07-16 08:08:24,426 [DatanodeAdminMonitor-0] INFO  BlockStateChange (DatanodeAdminManager.java:logBlockReplicationInfo(417)) - Block: blk_1073741825_1005, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas: 1, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is Open File: false, Datanodes having this block: 127.0.0.1:61366 127.0.0.1:61419 , Current Datanode: 127.0.0.1:61366, Is current datanode decommissioning: true, Is current datanode entering maintenance: false
> {quote}
> Note the inline comments above which illustrate the bug
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org