You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Joe Ellis (JIRA)" <ji...@apache.org> on 2016/05/05 18:55:13 UTC

[jira] [Commented] (HADOOP-13064) LineReader reports incorrect number of bytes read resulting in correctness issues using LineRecordReader

    [ https://issues.apache.org/jira/browse/HADOOP-13064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15272842#comment-15272842 ] 

Joe Ellis commented on HADOOP-13064:
------------------------------------

Yeah just bumped to 2.7.2 and my broken test passed. We can close this out.

> LineReader reports incorrect number of bytes read resulting in correctness issues using LineRecordReader
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-13064
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13064
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.7.1
>            Reporter: Joe Ellis
>            Priority: Critical
>         Attachments: LineReaderTest.java
>
>
> The specific issue we were seeing with LineReader is that when we pass in '\r\n' as the line delimiter the number of bytes that it claims to have read is less than what it actually read. We narrowed this down to only happening when the delimiter is split across the internal buffer boundary, so if fillbuffer fills with "row\r" and the next call fills with "\n" then the number of bytes reported would be 4 rather than 5.
> This results in correctness issues in LineRecordReader because if this off by one issue is seen enough times when reading a split then it will continue to read records past its split boundary, resulting in records appearing to come from multiple splits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org