You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "Israel Ekpo (JIRA)" <ji...@apache.org> on 2013/04/11 13:25:16 UTC

[jira] [Updated] (FLUME-1988) Add Support for Additional Deserializers for SpoolingDirectorySource

     [ https://issues.apache.org/jira/browse/FLUME-1988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Israel Ekpo updated FLUME-1988:
-------------------------------

    Description: 
There are certain use cases for SpoolingDirectorySource where the events in the log file are not delimited with newline characters.

Certain log files that contain stack traces, xml documents and pretty JSON strings seem to contain multiple new line characters within each event.

We can use alternative logic such as specific characters, strings or regular expressions to determine when the event is complete.

Hence I am proposing the following new deserializers based on org.apache.flume.serialization.LineDeserializer

# org.apache.flume.serialization.RegexDelimiterDeSerializer
Allows the user to specify a regular expression that is a delimiter for events within the log file

# org.apache.flume.serialization.CharSequenceDelimiterDeSerializer
Allows the user to specify a comma separated character sequence that is a delimiter for events within the log file
The user will specify an integer for the ascii characters and we will use that as the delimter.
For example support for \r\n could be specified as 13,10
A list of codes is available at http://www.asciitable.com/

We will also need to update the user guide with examples on how to configure and specify a custom deserializer.

  was:
There are certain use cases for SpoolingDirectorySource where the events in the log file are not delimited with newline characters.

Certain log files that contain stack traces, xml documents and pretty JSON strings seem to contain multiple new line characters within each event.

We can use alternative logic such as specific characters, strings or regular expressions to determine when the event is complete.

Hence I am proposing the following new deserializers based on org.apache.flume.serialization.LineDeserializer

# org.apache.flume.serialization.RegexDelimiterDeSerializer
Allows the user to specify a regular expression that is a delimiter for events within the log file

# org.apache.flume.serialization.RegexDelimiterDeSerializer
Allows the user to specify a comma separated character sequence that is a delimiter for events within the log file
The user will specify an integer for the ascii characters and we will use that as the delimter.
For example support for \r\n could be specified as 13,10
A list of codes is available at http://www.asciitable.com/

We will also need to update the user guide with examples on how to configure and specify a custom deserializer.

    
> Add Support for Additional Deserializers for SpoolingDirectorySource
> --------------------------------------------------------------------
>
>                 Key: FLUME-1988
>                 URL: https://issues.apache.org/jira/browse/FLUME-1988
>             Project: Flume
>          Issue Type: New Feature
>          Components: Docs, Sinks+Sources
>    Affects Versions: v1.4.0
>            Reporter: Israel Ekpo
>            Assignee: Israel Ekpo
>             Fix For: v1.4.0
>
>
> There are certain use cases for SpoolingDirectorySource where the events in the log file are not delimited with newline characters.
> Certain log files that contain stack traces, xml documents and pretty JSON strings seem to contain multiple new line characters within each event.
> We can use alternative logic such as specific characters, strings or regular expressions to determine when the event is complete.
> Hence I am proposing the following new deserializers based on org.apache.flume.serialization.LineDeserializer
> # org.apache.flume.serialization.RegexDelimiterDeSerializer
> Allows the user to specify a regular expression that is a delimiter for events within the log file
> # org.apache.flume.serialization.CharSequenceDelimiterDeSerializer
> Allows the user to specify a comma separated character sequence that is a delimiter for events within the log file
> The user will specify an integer for the ascii characters and we will use that as the delimter.
> For example support for \r\n could be specified as 13,10
> A list of codes is available at http://www.asciitable.com/
> We will also need to update the user guide with examples on how to configure and specify a custom deserializer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira