You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flume.apache.org by Eric Sammer <es...@cloudera.com> on 2011/10/21 01:32:21 UTC

NG ExecSource - documented warning

I've included the following in the javadoc for the ExecSource in NG:

"""

*org.apache.flume.source.ExecSource*

A Source<eclipse-javadoc:%E2%98%82=flume-ng-core/src%5C/main%5C/java%3Corg.apache.flume.source%7BExecSource.java%E2%98%83ExecSource%E2%98%82Source>implementation
that executes a Unix process and turns each line of text into
an event.

The ExecSource is meant for situations where one must integrate with
existing systems without modifying code. It is a compatibility gateway built
to allow simple, stop-gap integration and doesn't necessarily offer all of
the benefits or guarantees of native integration with Flume. If one has the
option of using the AvroSource, for instance, that would be greatly
preferred to this source as it (and similarly implemented sources) can
maintain the transactional guarantees that exec can not.

Why doesn't *ExecSource* offer transactional guarantees?

The problem with ExecSource and other asynchronous sources is that the
source can not guarantee that if there is a failure to put the event into
the Channel<eclipse-javadoc:%E2%98%82=flume-ng-core/src%5C/main%5C/java%3Corg.apache.flume.source%7BExecSource.java%E2%98%83ExecSource%E2%98%82Channel>the
client knows about it. As a for instance, one of the most commonly
requested features is the tail -F [file]-like use case where an application
writes to a log file on disk and Flume tails the file, sending each line as
an event. While this is possible, there's an obvious problem; what happens
if the channel fills up and Flume can't send an event? Flume has no way of
indicating to the application writing the log file that it needs to retain
the log or that the event hasn't been sent, for some reason. If this doesn't
make sense, you need only know this: *Your application can never guarantee
data has been received when using a unidirectional asynchronous interface
such as ExecSource!* As an extension of this warning - and to be completely
clear - there is absolutely zero guarantee of event delivery when using this
source. You have been warned.

"""

Does anyone feel like this isn't clear or disagrees with this warning? I'd
like to make sure this is *very* well understood by users going forward.
This would carry for any kind of source similar to exec (which absolutely
includes "tail").


Feedback welcome / appreciated.
-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: NG ExecSource - documented warning

Posted by Eric Sammer <es...@cloudera.com>.
On Thu, Oct 20, 2011 at 11:43 PM, Mingjie Lai <mj...@gmail.com> wrote:

>
> Eric.
>
> It makes sense to me. I also agree it can be applied to similar sources,
> such as the UDP source I'm dealing with
>
> > that executes a Unix process and ...
> Can also be a Windows command, right?
>

You know, I haven't tested it but provided ProcessBuilder / Process and
stdout work "as expected" on Windows, I don't see why it wouldn't work just
fine. My Windows knowledge is severely limited.


>
> Thanks,
> Mingjie
>
>
> On 10/20/2011 04:32 PM, Eric Sammer wrote:
>
>> I've included the following in the javadoc for the ExecSource in NG:
>>
>> """
>>
>> *org.apache.flume.source.**ExecSource*
>>
>> A Source<eclipse-javadoc:%E2%98%**82=flume-ng-core/src%5C/main%**
>> 5C/java%3Corg.apache.flume.**source%7BExecSource.java%E2%**
>> 98%83ExecSource%E2%98%**82Source>implementation
>>
>> that executes a Unix process and turns each line of text into
>> an event.
>>
>> The ExecSource is meant for situations where one must integrate with
>> existing systems without modifying code. It is a compatibility gateway
>> built
>> to allow simple, stop-gap integration and doesn't necessarily offer all of
>> the benefits or guarantees of native integration with Flume. If one has
>> the
>> option of using the AvroSource, for instance, that would be greatly
>> preferred to this source as it (and similarly implemented sources) can
>> maintain the transactional guarantees that exec can not.
>>
>> Why doesn't *ExecSource* offer transactional guarantees?
>>
>>
>> The problem with ExecSource and other asynchronous sources is that the
>> source can not guarantee that if there is a failure to put the event into
>> the Channel<eclipse-javadoc:%E2%**98%82=flume-ng-core/src%5C/**
>> main%5C/java%3Corg.apache.**flume.source%7BExecSource.**
>> java%E2%98%83ExecSource%E2%98%**82Channel>the
>>
>> client knows about it. As a for instance, one of the most commonly
>> requested features is the tail -F [file]-like use case where an
>> application
>> writes to a log file on disk and Flume tails the file, sending each line
>> as
>> an event. While this is possible, there's an obvious problem; what happens
>> if the channel fills up and Flume can't send an event? Flume has no way of
>> indicating to the application writing the log file that it needs to retain
>> the log or that the event hasn't been sent, for some reason. If this
>> doesn't
>> make sense, you need only know this: *Your application can never guarantee
>>
>> data has been received when using a unidirectional asynchronous interface
>> such as ExecSource!* As an extension of this warning - and to be
>> completely
>>
>> clear - there is absolutely zero guarantee of event delivery when using
>> this
>> source. You have been warned.
>>
>> """
>>
>> Does anyone feel like this isn't clear or disagrees with this warning? I'd
>> like to make sure this is *very* well understood by users going forward.
>> This would carry for any kind of source similar to exec (which absolutely
>> includes "tail").
>>
>>
>> Feedback welcome / appreciated.
>>
>


-- 
Eric Sammer
twitter: esammer
data: www.cloudera.com

Re: NG ExecSource - documented warning

Posted by Mingjie Lai <mj...@gmail.com>.
Eric.

It makes sense to me. I also agree it can be applied to similar sources, 
such as the UDP source I'm dealing with

 > that executes a Unix process and ...
Can also be a Windows command, right?

Thanks,
Mingjie

On 10/20/2011 04:32 PM, Eric Sammer wrote:
> I've included the following in the javadoc for the ExecSource in NG:
>
> """
>
> *org.apache.flume.source.ExecSource*
>
> A Source<eclipse-javadoc:%E2%98%82=flume-ng-core/src%5C/main%5C/java%3Corg.apache.flume.source%7BExecSource.java%E2%98%83ExecSource%E2%98%82Source>implementation
> that executes a Unix process and turns each line of text into
> an event.
>
> The ExecSource is meant for situations where one must integrate with
> existing systems without modifying code. It is a compatibility gateway built
> to allow simple, stop-gap integration and doesn't necessarily offer all of
> the benefits or guarantees of native integration with Flume. If one has the
> option of using the AvroSource, for instance, that would be greatly
> preferred to this source as it (and similarly implemented sources) can
> maintain the transactional guarantees that exec can not.
>
> Why doesn't *ExecSource* offer transactional guarantees?
>
> The problem with ExecSource and other asynchronous sources is that the
> source can not guarantee that if there is a failure to put the event into
> the Channel<eclipse-javadoc:%E2%98%82=flume-ng-core/src%5C/main%5C/java%3Corg.apache.flume.source%7BExecSource.java%E2%98%83ExecSource%E2%98%82Channel>the
> client knows about it. As a for instance, one of the most commonly
> requested features is the tail -F [file]-like use case where an application
> writes to a log file on disk and Flume tails the file, sending each line as
> an event. While this is possible, there's an obvious problem; what happens
> if the channel fills up and Flume can't send an event? Flume has no way of
> indicating to the application writing the log file that it needs to retain
> the log or that the event hasn't been sent, for some reason. If this doesn't
> make sense, you need only know this: *Your application can never guarantee
> data has been received when using a unidirectional asynchronous interface
> such as ExecSource!* As an extension of this warning - and to be completely
> clear - there is absolutely zero guarantee of event delivery when using this
> source. You have been warned.
>
> """
>
> Does anyone feel like this isn't clear or disagrees with this warning? I'd
> like to make sure this is *very* well understood by users going forward.
> This would carry for any kind of source similar to exec (which absolutely
> includes "tail").
>
>
> Feedback welcome / appreciated.