You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Kai Xie (JIRA)" <ji...@apache.org> on 2019/01/17 14:18:00 UTC

[jira] [Created] (HADOOP-16049) DistCp result has data and checksum mismatch when blocks per chunk > 0

Kai Xie created HADOOP-16049:
--------------------------------

             Summary: DistCp result has data and checksum mismatch when blocks per chunk > 0
                 Key: HADOOP-16049
                 URL: https://issues.apache.org/jira/browse/HADOOP-16049
             Project: Hadoop Common
          Issue Type: Bug
          Components: tools/distcp
    Affects Versions: 2.9.2
            Reporter: Kai Xie


In 2.9.2 RetriableFileCopyCommand.copyBytes,

 
{code:java}
int bytesRead = readBytes(inStream, buf, sourceOffset);
while (bytesRead >= 0) {
  ...
  if (action == FileAction.APPEND) {
    sourceOffset += bytesRead;
  }
  ... // write to dst
  bytesRead = readBytes(inStream, buf, sourceOffset);
}{code}
it does a positioned read but the position (`sourceOffset` here) is never updated when blocks per chunk is set to > 0 (which always disables append action). So for chunk with offset != 0, it will keep copying the first few bytes again and again, causing result to have data & checksum mismatch.

HADOOP-15292 has resolved this ticket by not using the positioned read, but has not been backported to branch-2 yet

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org