You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "David Jou (JIRA)" <ji...@apache.org> on 2017/12/04 02:38:00 UTC

[jira] [Commented] (HADOOP-13192) org.apache.hadoop.util.LineReader cannot handle multibyte delimiters correctly

    [ https://issues.apache.org/jira/browse/HADOOP-13192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16276229#comment-16276229 ] 

David Jou commented on HADOOP-13192:
------------------------------------

I wanna to report test case to show multibyte delimiter between buffers still incorrect. If the ambiguous characters is longer than one, the match processing will only do once and send all ambiguous characters as data when not matched.

        Delimiter = "***|";
        String CurrentBufferTailToken
                = "***|data***";
        String NextBufferHeadToken
                = "*|";
   

> org.apache.hadoop.util.LineReader cannot handle multibyte delimiters correctly
> ------------------------------------------------------------------------------
>
>                 Key: HADOOP-13192
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13192
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 2.6.2
>            Reporter: binde
>            Assignee: binde
>            Priority: Critical
>             Fix For: 2.8.0, 2.7.3, 2.6.5, 3.0.0-alpha1
>
>         Attachments: 0001-HADOOP-13192-org.apache.hadoop.util.LineReader-match.patch, 0002-fix-bug-hadoop-1392-add-test-case-for-LineReader.patch, HADOOP-13192.final.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> org.apache.hadoop.util.LineReader.readCustomLine()  has a bug,
> when line is   aaaabccc, recordDelimiter is aaab, the result should be a,ccc,
> show the code on line 310:
>       for (; bufferPosn < bufferLength; ++bufferPosn) {
>         if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
>           delPosn++;
>           if (delPosn >= recordDelimiterBytes.length) {
>             bufferPosn++;
>             break;
>           }
>         } else if (delPosn != 0) {
>           bufferPosn--;
>           delPosn = 0;
>         }
>       }
> shoud be :
>       for (; bufferPosn < bufferLength; ++bufferPosn) {
>         if (buffer[bufferPosn] == recordDelimiterBytes[delPosn]) {
>           delPosn++;
>           if (delPosn >= recordDelimiterBytes.length) {
>             bufferPosn++;
>             break;
>           }
>         } else if (delPosn != 0) {
>          // ------------- change here ------------- start ----
>           bufferPosn -= delPosn;
>          // ------------- change here ------------- end ----
>   
>           delPosn = 0;
>         }
>       }



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org