You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2013/09/28 03:31:03 UTC

[jira] [Commented] (FLUME-2182) Spooling Directory Source will not ingest data completely when a wide character appears at the edge of a buffer

    [ https://issues.apache.org/jira/browse/FLUME-2182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13780638#comment-13780638 ] 

Hudson commented on FLUME-2182:
-------------------------------

FAILURE: Integrated in flume-trunk #502 (See [https://builds.apache.org/job/flume-trunk/502/])
FLUME-2182. Spooling Directory Source will not ingest data completely when a wide character appears at the edge of a buffer (mpercy: http://git-wip-us.apache.org/repos/asf/flume/repo?p=flume.git&a=commit&h=ffa706429186df2cf8ad04fd9dcba37b6a35d7f1)
* flume-ng-core/src/test/java/org/apache/flume/serialization/TestResettableFileInputStream.java
* flume-ng-core/src/main/java/org/apache/flume/serialization/ResettableFileInputStream.java

                
> Spooling Directory Source will not ingest data completely when a wide character appears at the edge of a buffer
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2182
>                 URL: https://issues.apache.org/jira/browse/FLUME-2182
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0
>            Reporter: syntony liu
>            Assignee: Sven Meys
>            Priority: Critical
>             Fix For: v1.5.0
>
>         Attachments: Fix_for_FLUME-2182.patch, ModifiedLineDeserializer.java
>
>
> the bug is in ResettableFileInputStream.java: int readChar().
> if the last byte of buf is only a partial of a wide character, readChar()  shouldn't return -1(ResettableFileInputStream.java:186). it 
> loses the remanent data  in a file.
> I fix it such as: 
> public synchronized int readChar() throws IOException {
>    // if (!buf.hasRemaining()) {
>    if(buf.limit()- buf.position < 10){
>       refillBuf();
>     }
>     int start = buf.position();
>     charBuf.clear();
>     boolean isEndOfInput = false;
>     if (position >= fileSize) {
>       isEndOfInput = true;
>     }
>     CoderResult res = decoder.decode(buf, charBuf, isEndOfInput);
>     if (res.isMalformed() || res.isUnmappable()) {
>       res.throwException();
>     }
>     int delta = buf.position() - start;
>     charBuf.flip();
>     if (charBuf.hasRemaining()) {
>       char c = charBuf.get();
>       // don't increment the persisted location if we are in between a
>       // surrogate pair, otherwise we may never recover if we seek() to this
>       // location!
>       incrPosition(delta, !Character.isHighSurrogate(c));
>       return c;
>     // there may be a partial character in the decoder buffer
>     } else {
>       incrPosition(delta, false);
>       return -1;
>     }
>   }
> it avoid a partial character, but have new issue. sometime, some lines of a log file have a repeated character.
> eg. 
>    original file: 123456
>    sink file:     1233456

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira