You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Keith Wright (JIRA)" <ji...@apache.org> on 2014/04/10 18:48:14 UTC

[jira] [Updated] (FLUME-2360) Spooling Source: LineDeserializer incorrectly stops reading events if UTF-8 char occurs at end of buffer

     [ https://issues.apache.org/jira/browse/FLUME-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Keith Wright updated FLUME-2360:
--------------------------------

    Attachment: FBX_BID.80.log.2014-04-10-00-47.log

Example file for testing.  Example unit test is below which copies over the code within LineDeserializer, with code as is it will output 3333 but if you modify such that you double the default buffer size you'll get 5465

    @Test
    public void lowLevelTest() throws Exception {
        File file = new File("/tmp/FBX_BID.80.log.2014-04-10-00-47.log");
        PositionTracker positionTracker = new PositionTracker() {
            @Override
            public void storePosition(long l) throws IOException {

            }

            @Override
            public long getPosition() {
                return 0;
            }

            @Override
            public String getTarget() {
                return null;
            }

            @Override
            public void close() throws IOException {

            }
        };
        ResettableFileInputStream fileInputStream = null;
        int lineCnt = 0;
        try {
            //fileInputStream = new FileInputStream(file);
            fileInputStream = new ResettableFileInputStream(file,positionTracker,ResettableFileInputStream.DEFAULT_BUF_SIZE,Charsets.UTF_8);
            int c;
            while ((c = fileInputStream.readChar()) != -1) {
                // FIXME: support \r\n
                if (c == '\n') lineCnt++;
            }
        } finally {
            if (fileInputStream != null) fileInputStream.close();
        }
        System.out.println("lineCnt: "+lineCnt);
    }

> Spooling Source: LineDeserializer incorrectly stops reading events if UTF-8 char occurs at end of buffer
> --------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2360
>                 URL: https://issues.apache.org/jira/browse/FLUME-2360
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0
>         Environment: CDH 4.4
>            Reporter: Keith Wright
>         Attachments: FBX_BID.80.log.2014-04-10-00-47.log
>
>
> We are using a Spooling source to read from files and have noticed that there are situations where the source will stop reading in the middle of a file (see attached example file).  After much debugging, I have determine that this occurs if the buffer in ResettableFileInputStream happens to fill when in the middle of handling a UTF-8 character.  In this case readChar() will return -1 which ends the read lines loop in LineSerializer.  Running the spool source with the attached file should result in > 5400 events in the channel but now it will only return ~3300.



--
This message was sent by Atlassian JIRA
(v6.2#6252)