You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by "David Burgos. Isban (Banco Santander) (JIRA)" <ji...@apache.org> on 2015/12/09 09:26:11 UTC

[jira] [Updated] (FLUME-2800) Multiline log events for Taildir Source

     [ https://issues.apache.org/jira/browse/FLUME-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Burgos. Isban (Banco Santander) updated FLUME-2800:
---------------------------------------------------------
    Description: 
This a proposal of implementation to handle multiline log messages for new tailing source FLUME-2498.
Based on an idea FLUME-2779 MultiLine Deserializer for Spooling DIrectory Source.

Config.
* multiLineRegex: Regular expression to handle multiline log messages (grok expressions permitted)
* grokDictionaryDir: Custom Grok dictionaries directory
* maxNumberLines: Max number of lines per event in multiline log messages. Default 100. Remaining lines is never transferred to sink. 

For Regex expressions use Java regex engine.

Include a functionality for extracting grok expressions into a pure named regex (inspired by the logstash inteceptor)
By default load the included built-in grok dictionaries with pre-defined patterns.
https://github.com/aicer/grok


Attached patch includes a config documentation and unit tests.
Also attached a completed port/patch for Flume 1.6 a Java 1.6

  was:
This a proposal of implementation to handle multiline log messages for new tailing source FLUME-2498.
Based on an idea FLUME-2779 MultiLine Deserializer for Spooling DIrectory Source.

Config.
* multiLineRegex: Regular expression to handle multiline log messages (grok expressions permitted)
* grokDictionaryDir: Custom Grok dictionaries directory
* maxNumberLines: Max number of lines per event in multiline log messages. Default 100. Remaining lines is never transferred to sink. 

For Regex expressions use joni regex engine which can be twice as fast as the Java regex engine and will be more efficient, producing less object churn while scanning, because it operates natively on byte arrays.
https://github.com/jruby/joni

Include a functionality for extracting grok expressions into a pure named regex (inspired by the logstash inteceptor)
By default load the included built-in grok dictionaries with pre-defined patterns.
https://github.com/aicer/grok


Attached patch includes a config documentation and unit tests.
Also attached a completed port/patch for Flume 1.6 a Java 1.6


> Multiline log events for Taildir Source
> ---------------------------------------
>
>                 Key: FLUME-2800
>                 URL: https://issues.apache.org/jira/browse/FLUME-2800
>             Project: Flume
>          Issue Type: Improvement
>          Components: Sinks+Sources
>    Affects Versions: v1.6.0, v1.7.0
>            Reporter: David Burgos. Isban (Banco Santander)
>            Priority: Minor
>         Attachments: FLUME-2800-2-Flume1.6.patch, FLUME-2800-2.patch, FLUME-2800-Flume1.6.patch, FLUME-2800.patch
>
>
> This a proposal of implementation to handle multiline log messages for new tailing source FLUME-2498.
> Based on an idea FLUME-2779 MultiLine Deserializer for Spooling DIrectory Source.
> Config.
> * multiLineRegex: Regular expression to handle multiline log messages (grok expressions permitted)
> * grokDictionaryDir: Custom Grok dictionaries directory
> * maxNumberLines: Max number of lines per event in multiline log messages. Default 100. Remaining lines is never transferred to sink. 
> For Regex expressions use Java regex engine.
> Include a functionality for extracting grok expressions into a pure named regex (inspired by the logstash inteceptor)
> By default load the included built-in grok dictionaries with pre-defined patterns.
> https://github.com/aicer/grok
> Attached patch includes a config documentation and unit tests.
> Also attached a completed port/patch for Flume 1.6 a Java 1.6



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)