You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Brian Bockelman (JIRA)" <ji...@apache.org> on 2009/01/08 19:46:59 UTC

[jira] Created: (HADOOP-4994) Datanode should verify block sizes vs metadata on startup

Datanode should verify block sizes vs metadata on startup
---------------------------------------------------------

                 Key: HADOOP-4994
                 URL: https://issues.apache.org/jira/browse/HADOOP-4994
             Project: Hadoop Core
          Issue Type: Bug
            Reporter: Brian Bockelman


I could have sworn this bug had been reported by someone else already, but I can't find it on JIRA after searching.... apologies if this is a duplicate.

The datanode, upon starting up, should check and make sure that all block sizes as reported via `stat` are the same as the block sizes as reported via the block's metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4994) Datanode should verify block sizes vs metadata on startup

Posted by "Brian Bockelman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662409#action_12662409 ] 

Brian Bockelman commented on HADOOP-4994:
-----------------------------------------

Hey Dhruba,

That is correct (I guess I should mention, as this is a Java project, not a Unix project, stat is equivalent to File.length...).

This is the use case:
1) Node loses power.
2) On reboot, linux triggers an automatic fsck of hadoop's storage system
3) To clean up some discovered corruption, linux truncates one of Hadoop's blocks
4) Hadoop starts up - reads in the metadata, and assumes the block is OK.

I would like to alter step (4) to be:
4) Hadoop starts up, reads in metadata
5) Hadoop checks to make sure block length recorded in the metadata file is the same as the block length recorded by the ext3 filesystem.

My apologies if this is already done and I am just not understanding things correctly.

> Datanode should verify block sizes vs metadata on startup
> ---------------------------------------------------------
>
>                 Key: HADOOP-4994
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4994
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Brian Bockelman
>
> I could have sworn this bug had been reported by someone else already, but I can't find it on JIRA after searching.... apologies if this is a duplicate.
> The datanode, upon starting up, should check and make sure that all block sizes as reported via `stat` are the same as the block sizes as reported via the block's metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4994) Datanode should verify block sizes vs metadata on startup

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4994:
-------------------------------------

    Component/s: dfs

> Datanode should verify block sizes vs metadata on startup
> ---------------------------------------------------------
>
>                 Key: HADOOP-4994
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4994
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Brian Bockelman
>
> I could have sworn this bug had been reported by someone else already, but I can't find it on JIRA after searching.... apologies if this is a duplicate.
> The datanode, upon starting up, should check and make sure that all block sizes as reported via `stat` are the same as the block sizes as reported via the block's metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4994) Datanode should verify block sizes vs metadata on startup

Posted by "Raghu Angadi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662528#action_12662528 ] 

Raghu Angadi commented on HADOOP-4994:
--------------------------------------


Currently block metadata does not store size of the block. I don't think it should either. But DN can still detect the discrepancy since file lengths of metadata and block sizes don't tally (metadata file length shold be : header + ((block size + 511)/512)*4). 

> This is the use case: [...]
In this case, NN should have detected that that block is smaller than expected. I think it does.

> Datanode should verify block sizes vs metadata on startup
> ---------------------------------------------------------
>
>                 Key: HADOOP-4994
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4994
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Brian Bockelman
>
> I could have sworn this bug had been reported by someone else already, but I can't find it on JIRA after searching.... apologies if this is a duplicate.
> The datanode, upon starting up, should check and make sure that all block sizes as reported via `stat` are the same as the block sizes as reported via the block's metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4994) Datanode should verify block sizes vs metadata on startup

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662251#action_12662251 ] 

dhruba borthakur commented on HADOOP-4994:
------------------------------------------

let me understand this one. Suppose the datanode is storing its blocks on a Linux ext3 filesystem. are you saying that a stat on the Linux ext3 block file should return a file size that should be the same as reported by InterDatanodeProtocol.getBlockMetaDataInfo().getNumBytes()?



> Datanode should verify block sizes vs metadata on startup
> ---------------------------------------------------------
>
>                 Key: HADOOP-4994
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4994
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Brian Bockelman
>
> I could have sworn this bug had been reported by someone else already, but I can't find it on JIRA after searching.... apologies if this is a duplicate.
> The datanode, upon starting up, should check and make sure that all block sizes as reported via `stat` are the same as the block sizes as reported via the block's metadata.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.