You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/07/02 09:15:04 UTC

[jira] Created: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
-----------------------------------------------------------------------------------------------------

                 Key: HADOOP-1557
                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur


Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.

One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509957 ] 

Doug Cutting commented on HADOOP-1557:
--------------------------------------

> the bug I was pointing to occurs when setReplication() is called to decrease the number of replicas

Sorry it's taken me so long to understand this!  Yes, I see the issue now.  I'm not sure I yet have great sympathy for it.  In general, as one decreases the number of replicas, the chances that all of them may be corrupt increases.  After HADOOP-1134 we should primarily only see corruptions due to disk errors.  A disk can start failing at any time.  Validating some replicas before others are removed would somewhat reduce the chances that all replicas are corrupt, but not dramatically, so I'm not convinced it's worth the expense.

Disk errors are not entirely random.  When we see a single error from a disk, we're likely to see more from that disk.  So keeping statistics of the number of corruptions identified per datanode would be very valuable.  And automatically taking datanodes offline when corruptions exceed some threshold might go farther towards addressing this issue than explicitly checking blocks as replication thresholds are reduced, since this would remove replicas from failing drives *before* they're read, replicated, de-replicated, etc.

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509698 ] 

dhruba borthakur commented on HADOOP-1557:
------------------------------------------

The bug is that when a setReplication() command is sent to the NameNode, no data blocks are being read by the client and/or datanode. A block that was corrupt on the datanode is not known to the namenode at that time.

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509974 ] 

Doug Cutting commented on HADOOP-1557:
--------------------------------------

> a periodic disk block validation by the Datanode might be handy in detecting these types of problems

Yes, it would, especially if the filesystem has been idle or offline for a time.  But for an actively used filesystem, normal use might identify failing drives as effectively.  Scanning the research on disk failures, it looks like they more frequently return a read error rather than corrupt data.  Currently, it looks like a datanode shuts down when it encounters a read error, which is probably sufficient.  The OS shouldn't return a read error unless it has retried several times, the drives ECC has failed, etc.

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509952 ] 

dhruba borthakur commented on HADOOP-1557:
------------------------------------------

That sounds like a good policy. However, the bug I was pointing to occurs when setReplication() is called to *decrease* the number of replicas. In this case, no data blocks are read.

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509949 ] 

Doug Cutting commented on HADOOP-1557:
--------------------------------------

> when a setReplication() command is sent to the NameNode, no data blocks are being read

Right.  But a setReplication() triggers replications, and, when those replications happen, the data is read.  If, when writing the replica, the checksum of the received data does not match the checksum sent with that data, the receiving datanode should report to the namenode that the data was corrupt and abort the replication.  This would cause the source block to be removed (provided there are more replicas) and the namenode to initiate new replications from a different source.

After HADOOP-1134, datanodes should always validate checksums as blocks are written.  Whenever there's a mismatch, the write should be aborted.  If the write is a replication (as opposed to an initial write) the mismatch should be reported to the namenode.  Does that sound like the right policy to you?

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509963 ] 

dhruba borthakur commented on HADOOP-1557:
------------------------------------------

I agree. There is not much point in validating replicas just before a setReplication call is issued. Instead, a periodic disk block validation by the Datanode might be handy in detecting these types of problems.

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509697 ] 

Doug Cutting commented on HADOOP-1557:
--------------------------------------

> The Namenode should remove and replace corrupt replicas when they are reported by clients/datanodes, probably on read to start off with.

That's already done.  When a checksum error is encountered, and there's more than one replica, the offending replica is removed.

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1557) Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509636 ] 

Sameer Paranjpye commented on HADOOP-1557:
------------------------------------------

Why not have clients and/or datanodes report corrupt replicas to the Namenode? Is this not already done?

The Namenode should remove and replace corrupt replicas when they are reported by clients/datanodes, probably on read to start off with. This can be subsequently enhanced to to periodic block scanning.

> Deletion of excess replicas should prefer to delete corrupted replicas before deleting valid replicas
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-1557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1557
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> Suppose a block has three replicas and two of the replicas are corrupted. If the replication factor of the file is reduced to 2. The filesystem should preferably delete the two corrupted replicas, otherwise it could lead to a corrupted file.
> One option would be to make the datanode periodically validate all blocks with their corresponding CRCs. The other option would be to make the setReplication call validate existing replicas before deleting excess replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.