You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Dave Thompson (Created) (JIRA)" <ji...@apache.org> on 2012/03/31 00:15:27 UTC

[jira] [Created] (HADOOP-8233) Turn CRC checking off for 0 byte size and differing blocksizes

Turn CRC checking off for 0 byte size and differing blocksizes
--------------------------------------------------------------

                 Key: HADOOP-8233
                 URL: https://issues.apache.org/jira/browse/HADOOP-8233
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 0.23.3
            Reporter: Dave Thompson
            Assignee: Dave Thompson


DistcpV2 (hadoop-tools/hadoop-distcp/..) can fail from checksum failure, sometimes when copying a 0 byte file.    Root cause of this may have to do with an inconsistent nature of HDFS when creating 0 byte files, however distcp can avoid this issue by not checking CRC when size is zero.

Further, distcp fails checksum when copying from two clusters that use different blocksizes.  In this case it does not make sense to check CRC, as it is a guaranteed failure.

We need to turn CRC checking off for the above two cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8233) Turn CRC checking off for 0 byte size and differing blocksizes

Posted by "Dave Thompson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435209#comment-13435209 ] 

Dave Thompson commented on HADOOP-8233:
---------------------------------------

Decided it best to split these two issues out.   I created HADOOP-8703 to deal with skip CRC on 0 byte aspect.   
                
> Turn CRC checking off for 0 byte size and differing blocksizes
> --------------------------------------------------------------
>
>                 Key: HADOOP-8233
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8233
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.23.3
>            Reporter: Dave Thompson
>            Assignee: Dave Thompson
>         Attachments: HADOOP-8233-branch-0.23.2.patch
>
>
> DistcpV2 (hadoop-tools/hadoop-distcp/..) can fail from checksum failure, sometimes when copying a 0 byte file.    Root cause of this may have to do with an inconsistent nature of HDFS when creating 0 byte files, however distcp can avoid this issue by not checking CRC when size is zero.
> Further, distcp fails checksum when copying from two clusters that use different blocksizes.  In this case it does not make sense to check CRC, as it is a guaranteed failure.
> We need to turn CRC checking off for the above two cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8233) Turn CRC checking off for 0 byte size and differing blocksizes

Posted by "Allen Wittenauer (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251202#comment-13251202 ] 

Allen Wittenauer commented on HADOOP-8233:
------------------------------------------

this should probably have a test.
                
> Turn CRC checking off for 0 byte size and differing blocksizes
> --------------------------------------------------------------
>
>                 Key: HADOOP-8233
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8233
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.23.3
>            Reporter: Dave Thompson
>            Assignee: Dave Thompson
>         Attachments: HADOOP-8233-branch-0.23.2.patch
>
>
> DistcpV2 (hadoop-tools/hadoop-distcp/..) can fail from checksum failure, sometimes when copying a 0 byte file.    Root cause of this may have to do with an inconsistent nature of HDFS when creating 0 byte files, however distcp can avoid this issue by not checking CRC when size is zero.
> Further, distcp fails checksum when copying from two clusters that use different blocksizes.  In this case it does not make sense to check CRC, as it is a guaranteed failure.
> We need to turn CRC checking off for the above two cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HADOOP-8233) Turn CRC checking off for 0 byte size and differing blocksizes

Posted by "Dave Thompson (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dave Thompson updated HADOOP-8233:
----------------------------------

    Attachment: HADOOP-8233-branch-0.23.2.patch

Patch skips CRC on 0 byte size files and when blocksize between source and target do not match.
                
> Turn CRC checking off for 0 byte size and differing blocksizes
> --------------------------------------------------------------
>
>                 Key: HADOOP-8233
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8233
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.23.3
>            Reporter: Dave Thompson
>            Assignee: Dave Thompson
>         Attachments: HADOOP-8233-branch-0.23.2.patch
>
>
> DistcpV2 (hadoop-tools/hadoop-distcp/..) can fail from checksum failure, sometimes when copying a 0 byte file.    Root cause of this may have to do with an inconsistent nature of HDFS when creating 0 byte files, however distcp can avoid this issue by not checking CRC when size is zero.
> Further, distcp fails checksum when copying from two clusters that use different blocksizes.  In this case it does not make sense to check CRC, as it is a guaranteed failure.
> We need to turn CRC checking off for the above two cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (HADOOP-8233) Turn CRC checking off for 0 byte size and differing blocksizes

Posted by "Dave Thompson (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-8233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251654#comment-13251654 ] 

Dave Thompson commented on HADOOP-8233:
---------------------------------------

Hey Allen,  fwiw, that attachment is not the patch fix for this ticket.  Hope you weren't thinking otherwise prior to it being in PA state.

Regarding tests, I've been unit testing by creating different blocksize objects from the system default.  Something along the lines of:

 hdfs dfs -Ddfs.blocksize=33554432 -put testData /user/davet/testDataBS32MB

Likewise for zero length:
touch bla
hdfs dfs -put bla /user/davet/bla

distcp is run on the above data with system defaults.   The above tests will fail prior to this patch, and will succeed when complete.
                
> Turn CRC checking off for 0 byte size and differing blocksizes
> --------------------------------------------------------------
>
>                 Key: HADOOP-8233
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8233
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.23.3
>            Reporter: Dave Thompson
>            Assignee: Dave Thompson
>         Attachments: HADOOP-8233-branch-0.23.2.patch
>
>
> DistcpV2 (hadoop-tools/hadoop-distcp/..) can fail from checksum failure, sometimes when copying a 0 byte file.    Root cause of this may have to do with an inconsistent nature of HDFS when creating 0 byte files, however distcp can avoid this issue by not checking CRC when size is zero.
> Further, distcp fails checksum when copying from two clusters that use different blocksizes.  In this case it does not make sense to check CRC, as it is a guaranteed failure.
> We need to turn CRC checking off for the above two cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira