You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Keith Wright (JIRA)" <ji...@apache.org> on 2014/04/10 18:50:17 UTC

[jira] [Commented] (FLUME-2360) Spooling Source: LineDeserializer incorrectly stops reading events if UTF-8 char occurs at end of buffer

    [ https://issues.apache.org/jira/browse/FLUME-2360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13965526#comment-13965526 ] 

Keith Wright commented on FLUME-2360:
-------------------------------------

FYI as a workaround for now I have implemented my own line deserializer which uses standard BufferReader.readLine.  It is technically not durable but since our channel is a file channel (with a VERY high capacity) I have never actually had issues with the source adding events to it.

public static class MyLineDeserializer implements EventDeserializer.Builder {

        @Override
        public EventDeserializer build(Context context, ResettableInputStream in) {
            try {
                Field fileField = ResettableFileInputStream.class.getDeclaredField("file");
                fileField.setAccessible(true);
                in.close();
                return new MyLineReader((File)fileField.get(in));
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }

        static class MyLineReader implements EventDeserializer {

            final FileInputStream fileInputStream;
            final InputStreamReader inputStreamReader;
            final BufferedReader reader;

            MyLineReader(File file) throws Exception {
                fileInputStream = new FileInputStream(file);
                inputStreamReader = new InputStreamReader(fileInputStream);
                reader = new BufferedReader(inputStreamReader);
            }

            @Override
            public Event readEvent() throws IOException {
                List<Event> eventList = readEvents(1);
                return eventList.isEmpty() ? null : eventList.iterator().next();
            }

            @Override
            public List<Event> readEvents(int numEvents) throws IOException {
                int seenEvents = 0;
                String line;
                List<Event> eventList = new LinkedList<Event>();
                while ((line = reader.readLine()) != null && seenEvents++ < numEvents) {
                    eventList.add(EventBuilder.withBody(line, Charsets.UTF_8));
                }
                return eventList;
            }

            @Override
            public void mark() throws IOException {

            }

            @Override
            public void reset() throws IOException {
                // do nothing
            }

            @Override
            public void close() throws IOException {
                fileInputStream.close();
                inputStreamReader.close();
                reader.close();
            }
        }

> Spooling Source: LineDeserializer incorrectly stops reading events if UTF-8 char occurs at end of buffer
> --------------------------------------------------------------------------------------------------------
>
>                 Key: FLUME-2360
>                 URL: https://issues.apache.org/jira/browse/FLUME-2360
>             Project: Flume
>          Issue Type: Bug
>          Components: Sinks+Sources
>    Affects Versions: v1.4.0
>         Environment: CDH 4.4
>            Reporter: Keith Wright
>         Attachments: FBX_BID.80.log.2014-04-10-00-47.log
>
>
> We are using a Spooling source to read from files and have noticed that there are situations where the source will stop reading in the middle of a file (see attached example file).  After much debugging, I have determine that this occurs if the buffer in ResettableFileInputStream happens to fill when in the middle of handling a UTF-8 character.  In this case readChar() will return -1 which ends the read lines loop in LineSerializer.  Running the spool source with the attached file should result in > 5400 events in the channel but now it will only return ~3300.



--
This message was sent by Atlassian JIRA
(v6.2#6252)