You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "BELUGA BEHR (JIRA)" <ji...@apache.org> on 2017/08/03 22:09:00 UTC

[jira] [Commented] (MAPREDUCE-1821) IFile.Reader should check whether data crc has checked before it stop reading.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16113585#comment-16113585 ] 

BELUGA BEHR commented on MAPREDUCE-1821:
----------------------------------------

Well, I don't think this is a big deal.  There is only one checksum for the entire file, so you can't trust the values of {{nextRawKey}} until you get to the end of the file anyway.  At which point, a call to the {{close}} method will cause the remaining bits to be check-sum.

{code}
ArrayList<Map.Entry<K,V>> results = new ArrayList<>();
IFile.Reader reader = new IFile.Reader(...);
try {
  // loop 1,2,3,EOF,5,6,7,EOF
  while (reader.nextRawKey(buf)) {
  // serialize buffer into Key
  reader.nextRawValue(buf);
  // serialize buffer into Value
  results.add(new Map.Entry<K,V>(keyValue,mapValue));
  } finally {
    try {
      // reads rest of file and validates checksum
      reader.close();
    } catch (ChecksumException cse) {
      // Has values 1,2,3
      results.clear();
    }
}
return results;
{code}

Now, I don't know if this is being done anywhere, but the facilities are there.

> IFile.Reader should check whether data crc has checked before it stop reading.
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1821
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1821
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>            Reporter: ZhuGuanyin
>            Assignee: ZhuGuanyin
>
> Currently IFile data has crc checked in IFileInputStream (doRead method), 
> Normally the IFile would end with 2 bytes of -1, which means EOF_MARKER for keylength and valuelength, and then with 4 bytes crc checksum;
> IFileInputStream  checksumIn would check crc before IFile.Reader get EOF_MARKER, 
> IFile.Reader would stop reading when positionToNextRecord() read keylength EOF_MARKER(-1),and valuelength  EOF_MARKER(-1);
> But if something error happened(IFile corrupted), if the IFileReader read -1, -1 not at end of the IFile, the data may not checked! 
> Then Reader thought it had got all data and close reader......the task may fake success without any WARNing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org