You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2007/10/09 06:36:50 UTC

[jira] Updated: (HADOOP-2012) Periodic verification at the Datanode

     [ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-2012:
-------------------------------------

    Component/s: dfs
    Description: 
Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode.  These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks.

Some of the issues to consider :

- How should we check the blocks ( no more often than once every couple of weeks ?)
- How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ).
- What action to take once a corruption is detected
- Scanning should be done as a very low priority with rest of the datanode disk traffic in mind.

  was:

Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode.  These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks.

Some of the issues to consider :

- How should we check the blocks ( no more often than once every couple of weeks ?)
- How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ).
- What action to take once a corruption is detected
- Scanning should be done as a very low priority with rest of the datanode disk traffic in mind.


Another proposal would be to dedicate a certain percentage of the disk bandwidth (e.g. 5%) to verifications tasks. (I believe HDFS adopts a similar approach for rebalancing, wherein a certain bandwidth is reserved for rebalancing jobs).

The Datanode would continuously keep verifying blocks in a round-robin manner. Would it be enough to store the last verification time of a block only on memory (and not on the disk) and start on a random block when datanode restarts? Once a corruption is detected, it can remove that block and trigger a block report.

> Periodic verification at the Datanode
> -------------------------------------
>
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode.  These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.