You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Juhani Connolly <ju...@gmail.com> on 2012/07/10 11:56:29 UTC

Review Request: FLUME-1361: Add event batching to ExecSource

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/
-----------------------------------------------------------

Review request for Flume.


Description
-------

Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction


This addresses bug FLUME-1361.
    https://issues.apache.org/jira/browse/FLUME-1361


Diffs
-----

  flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
  flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 

Diff: https://reviews.apache.org/r/5880/diff/


Testing
-------

Unit tests pass though new ones were not added.

Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource


Thanks,

Juhani Connolly


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Juhani Connolly <ju...@gmail.com>.

> On July 10, 2012, 12:10 p.m., Jarek Cecho wrote:
> > flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java, line 123
> > <https://reviews.apache.org/r/5880/diff/2/?file=121202#file121202line123>
> >
> >     Could you substitute "," with "<" please? :-)

oops, got that. Adding user guide docs with patch, thanks for reminding me


- Juhani


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/#review9014
-----------------------------------------------------------


On July 10, 2012, 9:56 a.m., Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5880/
> -----------------------------------------------------------
> 
> (Updated July 10, 2012, 9:56 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction
> 
> 
> This addresses bug FLUME-1361.
>     https://issues.apache.org/jira/browse/FLUME-1361
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
> 
> Diff: https://reviews.apache.org/r/5880/diff/
> 
> 
> Testing
> -------
> 
> Unit tests pass though new ones were not added.
> 
> Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource
> 
> 
> Thanks,
> 
> Juhani Connolly
> 
>


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Jarek Cecho <ja...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/#review9014
-----------------------------------------------------------


Hi Juhani,
code changes seems fine to me. Could you also provide update of the user guide with description of your changes?


flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java
<https://reviews.apache.org/r/5880/#comment19170>

    Could you substitute "," with "<" please? :-)


- Jarek Cecho


On July 10, 2012, 9:56 a.m., Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5880/
> -----------------------------------------------------------
> 
> (Updated July 10, 2012, 9:56 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction
> 
> 
> This addresses bug FLUME-1361.
>     https://issues.apache.org/jira/browse/FLUME-1361
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
> 
> Diff: https://reviews.apache.org/r/5880/diff/
> 
> 
> Testing
> -------
> 
> Unit tests pass though new ones were not added.
> 
> Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource
> 
> 
> Thanks,
> 
> Juhani Connolly
> 
>


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Juhani Connolly <ju...@gmail.com>.

> On July 10, 2012, 5:58 p.m., Hari Shreedharan wrote:
> > Juhani,
> > 
> > Thanks for the patch. As such I am ok with this patch. 
> > 
> > Considering it is an exec source, it is a given that it is a fire and forget source. The only issue I see with this patch, is that it is possible that if Flume fails/node crashes, uto bufferCount lines of output may not be persisted to the channel, though it is a simple extension of the fact that 1 event may not have been persisted in the current code as well. So maybe we should mention that in the java docs. Also we should mention the default.

I've documented the default in both the javadoc and(in the updated patch I'll put up) user guide.

As to the loss risk, the current one is also susceptible to more than one line... It will lose as much as gets picked up in between commits. It doesn't actually wait for the buffer to fill up, it  just keeps taking until the buffer is full, OR there is nothing to take, so in a fast channel it will just be entering an event at a time. I documented it in the user guide as the "max number of lines to read and send to a channel at a time". 


- Juhani


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/#review9025
-----------------------------------------------------------


On July 10, 2012, 9:56 a.m., Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5880/
> -----------------------------------------------------------
> 
> (Updated July 10, 2012, 9:56 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction
> 
> 
> This addresses bug FLUME-1361.
>     https://issues.apache.org/jira/browse/FLUME-1361
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
> 
> Diff: https://reviews.apache.org/r/5880/diff/
> 
> 
> Testing
> -------
> 
> Unit tests pass though new ones were not added.
> 
> Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource
> 
> 
> Thanks,
> 
> Juhani Connolly
> 
>


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Hari Shreedharan <hs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/#review9025
-----------------------------------------------------------


Juhani,

Thanks for the patch. As such I am ok with this patch. 

Considering it is an exec source, it is a given that it is a fire and forget source. The only issue I see with this patch, is that it is possible that if Flume fails/node crashes, uto bufferCount lines of output may not be persisted to the channel, though it is a simple extension of the fact that 1 event may not have been persisted in the current code as well. So maybe we should mention that in the java docs. Also we should mention the default.

- Hari Shreedharan


On July 10, 2012, 9:56 a.m., Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5880/
> -----------------------------------------------------------
> 
> (Updated July 10, 2012, 9:56 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction
> 
> 
> This addresses bug FLUME-1361.
>     https://issues.apache.org/jira/browse/FLUME-1361
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
> 
> Diff: https://reviews.apache.org/r/5880/diff/
> 
> 
> Testing
> -------
> 
> Unit tests pass though new ones were not added.
> 
> Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource
> 
> 
> Thanks,
> 
> Juhani Connolly
> 
>


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Hari Shreedharan <hs...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/#review9049
-----------------------------------------------------------

Ship it!


+1. lgtm. Please fix the documentation and attach the patch to the jira.


flume-ng-doc/sphinx/FlumeUserGuide.rst
<https://reviews.apache.org/r/5880/#comment19202>

    The default is 20 according to the code, isn't it?


- Hari Shreedharan


On July 11, 2012, 4:03 a.m., Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5880/
> -----------------------------------------------------------
> 
> (Updated July 11, 2012, 4:03 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction
> 
> 
> This addresses bug FLUME-1361.
>     https://issues.apache.org/jira/browse/FLUME-1361
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
>   flume-ng-doc/sphinx/FlumeUserGuide.rst 7967d7c 
> 
> Diff: https://reviews.apache.org/r/5880/diff/
> 
> 
> Testing
> -------
> 
> Unit tests pass though new ones were not added.
> 
> Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource
> 
> 
> Thanks,
> 
> Juhani Connolly
> 
>


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Juhani Connolly <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/
-----------------------------------------------------------

(Updated July 11, 2012, 5:25 a.m.)


Review request for Flume.


Changes
-------

Corrected documented default from 100 to 20


Description
-------

Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction


This addresses bug FLUME-1361.
    https://issues.apache.org/jira/browse/FLUME-1361


Diffs (updated)
-----

  flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
  flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
  flume-ng-doc/sphinx/FlumeUserGuide.rst 7967d7c 

Diff: https://reviews.apache.org/r/5880/diff/


Testing
-------

Unit tests pass though new ones were not added.

Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource


Thanks,

Juhani Connolly


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Juhani Connolly <ju...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/
-----------------------------------------------------------

(Updated July 11, 2012, 4:03 a.m.)


Review request for Flume.


Changes
-------

Updated with suggested changes/added docs.

I also added a couple of sentences to the FileChannel about using buffered sinks/sources to get decent throughput.


Description
-------

Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction


This addresses bug FLUME-1361.
    https://issues.apache.org/jira/browse/FLUME-1361


Diffs (updated)
-----

  flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
  flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
  flume-ng-doc/sphinx/FlumeUserGuide.rst 7967d7c 

Diff: https://reviews.apache.org/r/5880/diff/


Testing
-------

Unit tests pass though new ones were not added.

Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource


Thanks,

Juhani Connolly


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Juhani Connolly <ju...@gmail.com>.

> On July 10, 2012, 5:46 p.m., Brock Noland wrote:
> > lgtm. One item, we call this kind of configuration "batchSize" most places correct?

Sounds better to me, done


- Juhani


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/#review9024
-----------------------------------------------------------


On July 10, 2012, 9:56 a.m., Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5880/
> -----------------------------------------------------------
> 
> (Updated July 10, 2012, 9:56 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction
> 
> 
> This addresses bug FLUME-1361.
>     https://issues.apache.org/jira/browse/FLUME-1361
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
> 
> Diff: https://reviews.apache.org/r/5880/diff/
> 
> 
> Testing
> -------
> 
> Unit tests pass though new ones were not added.
> 
> Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource
> 
> 
> Thanks,
> 
> Juhani Connolly
> 
>


Re: Review Request: FLUME-1361: Add event batching to ExecSource

Posted by Brock Noland <br...@cloudera.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5880/#review9024
-----------------------------------------------------------


lgtm. One item, we call this kind of configuration "batchSize" most places correct?

- Brock Noland


On July 10, 2012, 9:56 a.m., Juhani Connolly wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5880/
> -----------------------------------------------------------
> 
> (Updated July 10, 2012, 9:56 a.m.)
> 
> 
> Review request for Flume.
> 
> 
> Description
> -------
> 
> Added a new configuration setting "bufferCount" to allow for multiple lines read from the exec output to be committed in a single channel transaction
> 
> 
> This addresses bug FLUME-1361.
>     https://issues.apache.org/jira/browse/FLUME-1361
> 
> 
> Diffs
> -----
> 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSource.java dbf79e0 
>   flume-ng-core/src/main/java/org/apache/flume/source/ExecSourceConfigurationConstants.java 73c985e 
> 
> Diff: https://reviews.apache.org/r/5880/diff/
> 
> 
> Testing
> -------
> 
> Unit tests pass though new ones were not added.
> 
> Further, tested this on our log generating servers and it alleviated the incredibly poor performance of file channel caused by the constant commits coming from ExecSource
> 
> 
> Thanks,
> 
> Juhani Connolly
> 
>