You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by "Eli Collins (JIRA)" <ji...@apache.org> on 2012/09/12 20:39:07 UTC

[jira] [Created] (HDFS-3931) TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken

Eli Collins created HDFS-3931:
---------------------------------

             Summary: TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken
                 Key: HDFS-3931
                 URL: https://issues.apache.org/jira/browse/HDFS-3931
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: test
    Affects Versions: 2.0.0-alpha
            Reporter: Eli Collins
            Assignee: Andy Isaacson


Per Andy's comment on HDFS-3902:

TestDatanodeBlockScanner still fails about 1/5 runs in testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also uncovered by HDFS-3828.
The failure scenario for this one is a bit more tricky. I think I've captured the scenario below:

- The test corrupts 2/3 replicas.
- client reports a bad block.
- NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
- DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed.
- NN keeps the block on pendingReplications.
- BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above.
since block is on pendingReplications, NN does not schedule another replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HDFS-3931) TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HDFS-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins resolved HDFS-3931.
-------------------------------

    Resolution: Fixed
    
> TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken
> -------------------------------------------------------------
>
>                 Key: HDFS-3931
>                 URL: https://issues.apache.org/jira/browse/HDFS-3931
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Andy Isaacson
>            Priority: Minor
>             Fix For: 2.0.3-alpha
>
>         Attachments: hdfs3931-1.txt, hdfs3931-2.txt, hdfs3931-3.txt, hdfs3931.txt
>
>
> Per Andy's comment on HDFS-3902:
> TestDatanodeBlockScanner still fails about 1/5 runs in testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also uncovered by HDFS-3828.
> The failure scenario for this one is a bit more tricky. I think I've captured the scenario below:
> - The test corrupts 2/3 replicas.
> - client reports a bad block.
> - NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
> - DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed.
> - NN keeps the block on pendingReplications.
> - BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above.
> since block is on pendingReplications, NN does not schedule another replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HDFS-3931) TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken

Posted by "Eli Collins (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HDFS-3931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins reopened HDFS-3931:
-------------------------------

    
> TestDatanodeBlockScanner#testBlockCorruptionPolicy2 is broken
> -------------------------------------------------------------
>
>                 Key: HDFS-3931
>                 URL: https://issues.apache.org/jira/browse/HDFS-3931
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Andy Isaacson
>            Priority: Minor
>             Fix For: 2.0.3-alpha
>
>         Attachments: hdfs3931-1.txt, hdfs3931-2.txt, hdfs3931-3.txt, hdfs3931.txt
>
>
> Per Andy's comment on HDFS-3902:
> TestDatanodeBlockScanner still fails about 1/5 runs in testBlockCorruptionRecoveryPolicy2. That's due to a separate test issue also uncovered by HDFS-3828.
> The failure scenario for this one is a bit more tricky. I think I've captured the scenario below:
> - The test corrupts 2/3 replicas.
> - client reports a bad block.
> - NN asks a DN to re-replicate, and randomly picks the other corrupt replica.
> - DN notices the incoming replica is corrupt and reports it as a bad block, but does not inform the NN that re-replication failed.
> - NN keeps the block on pendingReplications.
> - BP scanner wakes up on both DNs with corrupt blocks, both report corruption. NN reports both as duplicates, one from the client and one from the DN report above.
> since block is on pendingReplications, NN does not schedule another replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira