You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Raghu Angadi (JIRA)" <ji...@apache.org> on 2007/10/09 19:35:50 UTC

[jira] Commented: (HADOOP-2012) Periodic verification at the Datanode

    [ https://issues.apache.org/jira/browse/HADOOP-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12533426 ] 

Raghu Angadi commented on HADOOP-2012:
--------------------------------------


The main issue I am wondering about is also whether to store the last verification time persistently or not. Even if we want to store persistently, this would be non critical data.. some simple text file would do. Storing this persistently is not that important for real clusters since we expect the datanodes to stay up for many days. This mainly matters during dev where we start and restart all the time. 

So I vote for storing only in memory as well.

Yes, we should throttle the scan rate. we can do something like at most 5% (what 5% means is another issue!) and at most once a week. Rebalancing already adds an optional throttler for reader. will use the same. Also whenever a complete block is read without errors as part of normal operation, we will update the last verification time.






> Periodic verification at the Datanode
> -------------------------------------
>
>                 Key: HADOOP-2012
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2012
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>
> Currently on-disk data corruption on data blocks is detected only when it is read by the client or by another datanode.  These errors are detected much earlier if datanode can periodically verify the data checksums for the local blocks.
> Some of the issues to consider :
> - How should we check the blocks ( no more often than once every couple of weeks ?)
> - How do we keep track of when a block was last verfied ( there is a .meta file associcated with each lock ).
> - What action to take once a corruption is detected
> - Scanning should be done as a very low priority with rest of the datanode disk traffic in mind.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.