You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Johny Rufus (JIRA)" <ji...@apache.org> on 2014/11/12 18:09:34 UTC

[jira] [Comment Edited] (FLUME-2538) TestResettableFileInputStream fails on JDK 8

    [ https://issues.apache.org/jira/browse/FLUME-2538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199523#comment-14199523 ] 

Johny Rufus edited comment on FLUME-2538 at 11/12/14 5:09 PM:
--------------------------------------------------------------

One way to see the difference in the behavior is by running the below code in Jdk 7 and 8
@Test
  public void testTest() {
    CharsetDecoder decoder = Charsets.UTF_8.newDecoder();

    decoder.onMalformedInput(CodingErrorAction.REPLACE);
    decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);

    ByteBuffer buf = ByteBuffer.allocate(20);
    buf.put(new byte[] { (byte)0xf8, (byte)0xa1, (byte)0xa1, (byte)0xa1,
            (byte)0xa1 });
    buf.flip();

    CharBuffer cbuf = CharBuffer.allocate(1);
    CoderResult res = decoder.decode(buf, cbuf, false);
    System.out.println(decoder.getClass().getName());
    System.out.println("Pos --- "+buf.position()+ "  cbuf pos --"+cbuf.position());
}

Jdk 7 output -->Pos --- 5  cbuf pos --1
Jdk 8 output -->Pos --- 1  cbuf pos --1

In Jdk7: If there is a invalid byte sequence and CodingErrorAction.Replace is specified, then the complete set of invalid bye sequence is treated as one malformed character and replaced by one replacement character in the output buffer  [Hence the position is advanced by 5 as seen in the output as its a 5 byte invalid sequence]

In Jdk8: Each invalid byte in the sequence is treated as a malformed character and hence we see the buffer being advanced by only one position. So for every malformed character, we see the replacement character included in the output buffer

Attaching a patch that accommodates the above modified behavior 


was (Author: jrufus):
One way to see the difference in the behavior is by running the below code in Jdk 7 and 8
@Test
  public void testTest() {
    CharsetDecoder decoder = Charsets.UTF_8.newDecoder();

    decoder.onMalformedInput(CodingErrorAction.REPLACE);
    decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);

    ByteBuffer buf = ByteBuffer.allocate(20);
    buf.put(new byte[] { (byte)0xf8, (byte)0xa1, (byte)0xa1, (byte)0xa1,
            (byte)0xa1 });
    buf.flip();

    CharBuffer cbuf = CharBuffer.allocate(1);
    CoderResult res = decoder.decode(buf, cbuf, false);
    System.out.println(decoder.getClass().getName());
    System.out.println("Pos --- "+buf.position()+ "  cbuf pos --"+cbuf.position());
}

Jdk 7 output -->Pos --- 5  cbuf pos --1
Jdk 8 output -->Pos --- 1  cbuf pos --1

In Jdk7: If there are a group of malformed characters and CodingErrorAction.Replace is specified, then the complete set of adjacent malformed characters in the buffer are replaced [Hence the position is advanced by 5 as seen in the output as there are 5 malformed chars in the buffer]

In Jdk8: Each malformed character is treated as a separate entity and hence we see the buffer being advanced by only one position. So for every malformed character, we see the replacement character included in the output buffer

Attaching a patch that accomodates the above modified behavior 

> TestResettableFileInputStream fails on JDK 8
> --------------------------------------------
>
>                 Key: FLUME-2538
>                 URL: https://issues.apache.org/jira/browse/FLUME-2538
>             Project: Flume
>          Issue Type: Bug
>    Affects Versions: v1.5.0.1
>            Reporter: Johny Rufus
>            Assignee: Johny Rufus
>             Fix For: v1.6.0
>
>         Attachments: FLUME-2538.patch
>
>
> TestResettableFileInputStream.testUtf8DecodeErrorHandlingReplace fails in JDK 8
> "testUtf8DecodeErrorHandlingReplace(org.apache.flume.serialization.TestResettableFileInputStream)  Time elapsed: 6 sec  <<< FAILURE!
> org.junit.ComparisonFailure: expected:<...(���)
> NonUnicode: (�[])
> > but was:<...(���)
> NonUnicode: (�[����]) "
> Charsetdecoder.decode has changed in its behavior, as to how it handles CodingErrorAction.Replace policy 
> Will submit a patch today.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)