You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kai Xie (JIRA)" <ji...@apache.org> on 2019/01/17 14:29:00 UTC

[jira] [Comment Edited] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

    [ https://issues.apache.org/jira/browse/HADOOP-16049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745112#comment-16745112 ] 

Kai Xie edited comment on HADOOP-16049 at 1/17/19 2:28 PM:
-----------------------------------------------------------

This ticket has been resolved by HADOOP-15292 by not using positioned read in trunk/branch-3.1/branch-3.2, but it's not backported to branch-2 yet


was (Author: kai33):
This ticket has been resolved by HADOOP-15292 by not using positioned read, but it's not backported to branch-2 yet

> DistCp result has data and checksum mismatch when blocks per chunk > 0
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-16049
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16049
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: tools/distcp
>    Affects Versions: 2.9.2
>            Reporter: Kai Xie
>            Priority: Major
>
> In 2.9.2 RetriableFileCopyCommand.copyBytes,
> {code:java}
> int bytesRead = readBytes(inStream, buf, sourceOffset);
> while (bytesRead >= 0) {
>   ...
>   if (action == FileAction.APPEND) {
>     sourceOffset += bytesRead;
>   }
>   ... // write to dst
>   bytesRead = readBytes(inStream, buf, sourceOffset);
> }{code}
> it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch.
> To re-produce this issue, in branch-2, update BLOCK_SIZE to 10240 (> default copy buffer size) in class TestDistCpSystem.
> HADOOP-15292 has resolved this ticket by not using the positioned read, but has not been backported to branch-2 yet
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org