You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by tinawenqiao <31...@qq.com> on 2016/07/06 03:38:54 UTC

Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And make the buffersize be configured

Hi,all:
   I submit a pull request to flume-1.7 on github. The address is https://github.com/apache/flume/pull/54 .
   The changes are as follows:
   1.  Support multiline. Users can define the start regex of multiline.  
        Add a parameter REGEX_START in TaildirSourceConfigurationConstants.java.REGEX_START is used for generating Flume events containing multiple lines in the body, per event. The parameter determines the start of an event. Default value is "". If the value is set to "", a line with the end of '\n' will be dealed into one flume event.
        The sample usage:
        agent.sources.taildirsource.lineStartRegex =  \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
        
   2.   Support recursive directory. Wildcards are allowed in the directory name. 
         Modify the function getMatchFiles() in ReliableTaildirEventReader.java to support this functionality.
         The sample usage:
         agent.sources.taildirsource.filegroups.f1 = /Users/wenqiao/work/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/*/01/[ab].log   
   3.   Fix the bug if a line‘s length exceeds 8192 bytes. Make the buffer size be configured. 
         Add a parameter BUFFER_SIZE in TaildirSourceConfigurationConstants.java.BUFFER_SIZE is used to define the max number of bytes for one flume event body's content. Default size is 8192.



    4.  Put the filePath, hostname, IP into the headers of a flume event if the headers do not contain the keys.

Re: Add Support multiline and recursive directory inTaildirSource(Flume-1.7). And make the buffersize be configured

Posted by Mike Percy <mp...@apache.org>.
Great, thanks!

Mike

On Fri, Jul 8, 2016 at 6:44 PM, 小火火 <31...@qq.com> wrote:

> Thanks for your suggestion.I will create JIRAs and provide patches.
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Mike Percy";<mp...@apache.org>;
> 发送时间: 2016年7月9日(星期六) 上午9:29
> 收件人: "dev@flume.apache.org"<de...@flume.apache.org>;
> 抄送: "小火火"<31...@qq.com>;
> 主题: Re: Add Support multiline and recursive directory
> inTaildirSource(Flume-1.7). And make the buffersize be configured
>
>
>
> Hi tinawenqiao,
> Thank you very much for your contribution!
>
> In short, I agree with Attila.
>
> We will occasionally merge a very small pull request (like small docs
> changes) but large changes need JIRAs to track them.
>
> Please file one JIRA (and provide a patch) for the bug fix, and please file
> at least one JIRA (and patch) for the features to be added.
>
> Thanks,
> Mike
>
> On Wed, Jul 6, 2016 at 1:05 AM, Attila Simon <sa...@cloudera.com> wrote:
>
> > Hi tinawenqiao,
> >
> > Thanks for moving this conversation from github to flume dev list. I
> > believe this is the best place to discuss development efforts. As
> > mentioned we generally don't accept pull request so please create a
> > jira(s) (based on how many different issues you would like to address)
> > and attach your patch to it/them as it is described on the
> > https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute in
> > detail.
> >
> > Regarding to your proposed changes:
> > 1) Sounds like a good improvement for TailDirSource (please check
> > Spooling Directory Source how similar is supported using deserializer:
> > https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source)
> > Your logic can be a part of a new deserializer.
> > 2) Sounds like a good improvement for TailDirSource but would be good
> > to avoid inventing a new pattern syntax (on a side note please check
> > out the latest development on SpoolingDirSource as it is now capable
> > of checking a directory subtree recursively it might have already what
> > you want to achieve)
> > 3) Bugfix sounds awesome
> > 4) Is something looks very specific to your use case. I believe it
> > could be a little bit more generalised and or driven by configuration
> > parameter(s).
> >
> >
> > Cheers,
> > Attila
> >
> > Attila Simon
> > Software Engineer
> > Email:   sati@cloudera.com
> >
> >
> >
> >
> > On Wed, Jul 6, 2016 at 5:42 AM, 黄鹏程 <gn...@foxmail.com> wrote:
> > > Fantastic Features! Support for this pull!
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "文乔";<31...@qq.com>;
> > > 发送时间: 2016年7月6日(星期三) 中午11:38
> > > 收件人: "dev"<de...@flume.apache.org>;
> > >
> > > 主题: Add Support multiline and recursive directory in
> > TaildirSource(Flume-1.7). And make the buffersize be configured
> > >
> > >
> > >
> > > Hi,all:
> > >    I submit a pull request to flume-1.7 on github. The address is
> > https://github.com/apache/flume/pull/54 .
> > >    The changes are as follows:
> > >    1.  Support multiline. Users can define the start regex of
> multiline.
> > >         Add a parameter REGEX_START in
> > TaildirSourceConfigurationConstants.java.REGEX_START is used for
> generating
> > Flume events containing multiple lines in the body, per event. The
> > parameter determines the start of an event. Default value is "". If the
> > value is set to "", a line with the end of '\n' will be dealed into one
> > flume event.
> > >         The sample usage:
> > >         agent.sources.taildirsource.lineStartRegex =
> > \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
> > >
> > >    2.   Support recursive directory. Wildcards are allowed in the
> > directory name.
> > >          Modify the function getMatchFiles() in
> > ReliableTaildirEventReader.java to support this functionality.
> > >          The sample usage:
> > >          agent.sources.taildirsource.filegroups.f1 =
> >
> /Users/wenqiao/work/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/*/01/[ab].log
> > >    3.   Fix the bug if a line‘s length exceeds 8192 bytes. Make the
> > buffer size be configured.
> > >          Add a parameter BUFFER_SIZE in
> > TaildirSourceConfigurationConstants.java.BUFFER_SIZE is used to define
> the
> > max number of bytes for one flume event body's content. Default size is
> > 8192.
> > >
> > >
> > >
> > >     4.  Put the filePath, hostname, IP into the headers of a flume
> event
> > if the headers do not contain the keys.
> >
>

回复: Add Support multiline and recursive directory inTaildirSource(Flume-1.7). And make the buffersize be configured

Posted by 小火火 <31...@qq.com>.
Thanks for your suggestion.I will create JIRAs and provide patches.




------------------ 原始邮件 ------------------
发件人: "Mike Percy";<mp...@apache.org>;
发送时间: 2016年7月9日(星期六) 上午9:29
收件人: "dev@flume.apache.org"<de...@flume.apache.org>; 
抄送: "小火火"<31...@qq.com>; 
主题: Re: Add Support multiline and recursive directory inTaildirSource(Flume-1.7). And make the buffersize be configured



Hi tinawenqiao,
Thank you very much for your contribution!

In short, I agree with Attila.

We will occasionally merge a very small pull request (like small docs
changes) but large changes need JIRAs to track them.

Please file one JIRA (and provide a patch) for the bug fix, and please file
at least one JIRA (and patch) for the features to be added.

Thanks,
Mike

On Wed, Jul 6, 2016 at 1:05 AM, Attila Simon <sa...@cloudera.com> wrote:

> Hi tinawenqiao,
>
> Thanks for moving this conversation from github to flume dev list. I
> believe this is the best place to discuss development efforts. As
> mentioned we generally don't accept pull request so please create a
> jira(s) (based on how many different issues you would like to address)
> and attach your patch to it/them as it is described on the
> https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute in
> detail.
>
> Regarding to your proposed changes:
> 1) Sounds like a good improvement for TailDirSource (please check
> Spooling Directory Source how similar is supported using deserializer:
> https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source)
> Your logic can be a part of a new deserializer.
> 2) Sounds like a good improvement for TailDirSource but would be good
> to avoid inventing a new pattern syntax (on a side note please check
> out the latest development on SpoolingDirSource as it is now capable
> of checking a directory subtree recursively it might have already what
> you want to achieve)
> 3) Bugfix sounds awesome
> 4) Is something looks very specific to your use case. I believe it
> could be a little bit more generalised and or driven by configuration
> parameter(s).
>
>
> Cheers,
> Attila
>
> Attila Simon
> Software Engineer
> Email:   sati@cloudera.com
>
>
>
>
> On Wed, Jul 6, 2016 at 5:42 AM, 黄鹏程 <gn...@foxmail.com> wrote:
> > Fantastic Features! Support for this pull!
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "文乔";<31...@qq.com>;
> > 发送时间: 2016年7月6日(星期三) 中午11:38
> > 收件人: "dev"<de...@flume.apache.org>;
> >
> > 主题: Add Support multiline and recursive directory in
> TaildirSource(Flume-1.7). And make the buffersize be configured
> >
> >
> >
> > Hi,all:
> >    I submit a pull request to flume-1.7 on github. The address is
> https://github.com/apache/flume/pull/54 .
> >    The changes are as follows:
> >    1.  Support multiline. Users can define the start regex of multiline.
> >         Add a parameter REGEX_START in
> TaildirSourceConfigurationConstants.java.REGEX_START is used for generating
> Flume events containing multiple lines in the body, per event. The
> parameter determines the start of an event. Default value is "". If the
> value is set to "", a line with the end of '\n' will be dealed into one
> flume event.
> >         The sample usage:
> >         agent.sources.taildirsource.lineStartRegex =
> \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
> >
> >    2.   Support recursive directory. Wildcards are allowed in the
> directory name.
> >          Modify the function getMatchFiles() in
> ReliableTaildirEventReader.java to support this functionality.
> >          The sample usage:
> >          agent.sources.taildirsource.filegroups.f1 =
> /Users/wenqiao/work/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/*/01/[ab].log
> >    3.   Fix the bug if a line‘s length exceeds 8192 bytes. Make the
> buffer size be configured.
> >          Add a parameter BUFFER_SIZE in
> TaildirSourceConfigurationConstants.java.BUFFER_SIZE is used to define the
> max number of bytes for one flume event body's content. Default size is
> 8192.
> >
> >
> >
> >     4.  Put the filePath, hostname, IP into the headers of a flume event
> if the headers do not contain the keys.
>

Re: Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And make the buffersize be configured

Posted by Mike Percy <mp...@apache.org>.
Hi tinawenqiao,
Thank you very much for your contribution!

In short, I agree with Attila.

We will occasionally merge a very small pull request (like small docs
changes) but large changes need JIRAs to track them.

Please file one JIRA (and provide a patch) for the bug fix, and please file
at least one JIRA (and patch) for the features to be added.

Thanks,
Mike

On Wed, Jul 6, 2016 at 1:05 AM, Attila Simon <sa...@cloudera.com> wrote:

> Hi tinawenqiao,
>
> Thanks for moving this conversation from github to flume dev list. I
> believe this is the best place to discuss development efforts. As
> mentioned we generally don't accept pull request so please create a
> jira(s) (based on how many different issues you would like to address)
> and attach your patch to it/them as it is described on the
> https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute in
> detail.
>
> Regarding to your proposed changes:
> 1) Sounds like a good improvement for TailDirSource (please check
> Spooling Directory Source how similar is supported using deserializer:
> https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source)
> Your logic can be a part of a new deserializer.
> 2) Sounds like a good improvement for TailDirSource but would be good
> to avoid inventing a new pattern syntax (on a side note please check
> out the latest development on SpoolingDirSource as it is now capable
> of checking a directory subtree recursively it might have already what
> you want to achieve)
> 3) Bugfix sounds awesome
> 4) Is something looks very specific to your use case. I believe it
> could be a little bit more generalised and or driven by configuration
> parameter(s).
>
>
> Cheers,
> Attila
>
> Attila Simon
> Software Engineer
> Email:   sati@cloudera.com
>
>
>
>
> On Wed, Jul 6, 2016 at 5:42 AM, 黄鹏程 <gn...@foxmail.com> wrote:
> > Fantastic Features! Support for this pull!
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "文乔";<31...@qq.com>;
> > 发送时间: 2016年7月6日(星期三) 中午11:38
> > 收件人: "dev"<de...@flume.apache.org>;
> >
> > 主题: Add Support multiline and recursive directory in
> TaildirSource(Flume-1.7). And make the buffersize be configured
> >
> >
> >
> > Hi,all:
> >    I submit a pull request to flume-1.7 on github. The address is
> https://github.com/apache/flume/pull/54 .
> >    The changes are as follows:
> >    1.  Support multiline. Users can define the start regex of multiline.
> >         Add a parameter REGEX_START in
> TaildirSourceConfigurationConstants.java.REGEX_START is used for generating
> Flume events containing multiple lines in the body, per event. The
> parameter determines the start of an event. Default value is "". If the
> value is set to "", a line with the end of '\n' will be dealed into one
> flume event.
> >         The sample usage:
> >         agent.sources.taildirsource.lineStartRegex =
> \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
> >
> >    2.   Support recursive directory. Wildcards are allowed in the
> directory name.
> >          Modify the function getMatchFiles() in
> ReliableTaildirEventReader.java to support this functionality.
> >          The sample usage:
> >          agent.sources.taildirsource.filegroups.f1 =
> /Users/wenqiao/work/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/*/01/[ab].log
> >    3.   Fix the bug if a line‘s length exceeds 8192 bytes. Make the
> buffer size be configured.
> >          Add a parameter BUFFER_SIZE in
> TaildirSourceConfigurationConstants.java.BUFFER_SIZE is used to define the
> max number of bytes for one flume event body's content. Default size is
> 8192.
> >
> >
> >
> >     4.  Put the filePath, hostname, IP into the headers of a flume event
> if the headers do not contain the keys.
>

Re: Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And make the buffersize be configured

Posted by Attila Simon <sa...@cloudera.com>.
Hi tinawenqiao,

Thanks for moving this conversation from github to flume dev list. I
believe this is the best place to discuss development efforts. As
mentioned we generally don't accept pull request so please create a
jira(s) (based on how many different issues you would like to address)
and attach your patch to it/them as it is described on the
https://cwiki.apache.org/confluence/display/FLUME/How+to+Contribute in
detail.

Regarding to your proposed changes:
1) Sounds like a good improvement for TailDirSource (please check
Spooling Directory Source how similar is supported using deserializer:
https://flume.apache.org/FlumeUserGuide.html#spooling-directory-source)
Your logic can be a part of a new deserializer.
2) Sounds like a good improvement for TailDirSource but would be good
to avoid inventing a new pattern syntax (on a side note please check
out the latest development on SpoolingDirSource as it is now capable
of checking a directory subtree recursively it might have already what
you want to achieve)
3) Bugfix sounds awesome
4) Is something looks very specific to your use case. I believe it
could be a little bit more generalised and or driven by configuration
parameter(s).


Cheers,
Attila

Attila Simon
Software Engineer
Email:   sati@cloudera.com




On Wed, Jul 6, 2016 at 5:42 AM, 黄鹏程 <gn...@foxmail.com> wrote:
> Fantastic Features! Support for this pull!
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "文乔";<31...@qq.com>;
> 发送时间: 2016年7月6日(星期三) 中午11:38
> 收件人: "dev"<de...@flume.apache.org>;
>
> 主题: Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And make the buffersize be configured
>
>
>
> Hi,all:
>    I submit a pull request to flume-1.7 on github. The address is https://github.com/apache/flume/pull/54 .
>    The changes are as follows:
>    1.  Support multiline. Users can define the start regex of multiline.
>         Add a parameter REGEX_START in TaildirSourceConfigurationConstants.java.REGEX_START is used for generating Flume events containing multiple lines in the body, per event. The parameter determines the start of an event. Default value is "". If the value is set to "", a line with the end of '\n' will be dealed into one flume event.
>         The sample usage:
>         agent.sources.taildirsource.lineStartRegex =  \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
>
>    2.   Support recursive directory. Wildcards are allowed in the directory name.
>          Modify the function getMatchFiles() in ReliableTaildirEventReader.java to support this functionality.
>          The sample usage:
>          agent.sources.taildirsource.filegroups.f1 = /Users/wenqiao/work/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/*/01/[ab].log
>    3.   Fix the bug if a line‘s length exceeds 8192 bytes. Make the buffer size be configured.
>          Add a parameter BUFFER_SIZE in TaildirSourceConfigurationConstants.java.BUFFER_SIZE is used to define the max number of bytes for one flume event body's content. Default size is 8192.
>
>
>
>     4.  Put the filePath, hostname, IP into the headers of a flume event if the headers do not contain the keys.

回复:Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And make the buffersize be configured

Posted by 黄鹏程 <gn...@foxmail.com>.
Fantastic Features! Support for this pull!




------------------ 原始邮件 ------------------
发件人: "文乔";<31...@qq.com>;
发送时间: 2016年7月6日(星期三) 中午11:38
收件人: "dev"<de...@flume.apache.org>; 

主题: Add Support multiline and recursive directory in TaildirSource(Flume-1.7). And make the buffersize be configured



Hi,all:
   I submit a pull request to flume-1.7 on github. The address is https://github.com/apache/flume/pull/54 .
   The changes are as follows:
   1.  Support multiline. Users can define the start regex of multiline.  
        Add a parameter REGEX_START in TaildirSourceConfigurationConstants.java.REGEX_START is used for generating Flume events containing multiple lines in the body, per event. The parameter determines the start of an event. Default value is "". If the value is set to "", a line with the end of '\n' will be dealed into one flume event.
        The sample usage:
        agent.sources.taildirsource.lineStartRegex =  \\s?\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d,\\d\\d\\d
        
   2.   Support recursive directory. Wildcards are allowed in the directory name. 
         Modify the function getMatchFiles() in ReliableTaildirEventReader.java to support this functionality.
         The sample usage:
         agent.sources.taildirsource.filegroups.f1 = /Users/wenqiao/work/flume/apache-flume-1.7.0-SNAPSHOT-bin/conf/*/01/[ab].log   
   3.   Fix the bug if a line‘s length exceeds 8192 bytes. Make the buffer size be configured. 
         Add a parameter BUFFER_SIZE in TaildirSourceConfigurationConstants.java.BUFFER_SIZE is used to define the max number of bytes for one flume event body's content. Default size is 8192.



    4.  Put the filePath, hostname, IP into the headers of a flume event if the headers do not contain the keys.