You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Cochran, David M (Contractor)" <Da...@bsee.gov> on 2012/09/21 20:52:28 UTC

Event flushing?

Is there a way to automatically flush an agent/source tailing a file to the sink even if the buffer is not yet full every xx seconds?

Maybe that's not worded quite right...  example

Tailing a log file sending to File_roll sink... works like a charm, however if activity stops, there are still a number of lines not sent to the sink, apparently waiting for the buffer to fill up.  This can be an issue for me as I want to have a script reviewing the logs on the sink for errors and such... but if something goes sideways and is recorded in the last xx lines not yet sent they could go undetected for a long period of time of not written to the sink.  

In the case I'm looking at right now, the log I'm tailing has 10 lines that have not been sent to the sink, since it's Friday afternoon there is little activity, actually none in the last 3 hours.  Taken a step further if the app crashed and only wrote out 5 lines calling for help, they could go undetected for a long time.  Anyway to flush any standing events to sink every 30 seconds or so?

Thanks,
Dave



Re: Event flushing?

Posted by Brock Noland <br...@cloudera.com>.
Yes, I understand. I was thinking initially that the buffer was on the
tail side as most command line programs buffer data when not writing
to a terminal. After further review, I am not sure tail behaves that
way.

Can you get a jstack <pid> of the flume agent while it's waiting?
What version of flume are you running?

Depending on the version, the data is probably one of these places:

1) In ExecSource's BufferedReader
2) In ExecSource's batch ( > 1.2)
3) In RollingFileSink's batch ( > 1.2)

Ultimately if you are concerned with data loss, tailing files is not a
good option. The communication from tail is one way, beyond that,
there is no guarantee that tail has started reading the file at the
appropriate location. Meaning when it starts more than 10 line so of
data could have been written before it starts reading.

Options with no data loss would include:

1) Waiting until the file is rotated and then copying it whole
2) Modifying the application using the SDK to write to say an AvroSink
3) Syslog would be more reliable than tail as well

Brock

On Fri, Sep 21, 2012 at 2:46 PM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
> Perhaps my explanation was unclear.  Flume is tailing a log file on the
> app server (sinking to another box (FILE_ROLL)) .  I'm manually tailing
> both the log file on the app server and the output file on the sink
> server.  The App server log has 10 lines of entries that have yet to be
> written at the sink side, 3+ hours has elapsed since the source log was
> updated.  Now, if I echo another dozen or so lines to the end of the
> source log, all the lines that were waiting and (some or all) newly
> added lines will appear at the sink.  Wash, Rinse, Repeat.
>
> I'm not sure where the last few lines are sitting that need to be
> sent/written out, but in limbo seems bad (at least from my perspective).
>
> I perhaps wrongly assumed they are sitting in some sort of buffer/bucket
> that is waiting to be full before sending.   If this is the case, then
> would periodically checking to see if there is data waiting to be
> committed even if the bucket is not full seem like a good idea?
>
>
> Dave
>
>
>
>
> -----Original Message-----
> From: Brock Noland [mailto:brock@cloudera.com]
> Sent: Friday, September 21, 2012 2:19 PM
> To: user@flume.apache.org
> Subject: Re: Event flushing?
>
> If you are sure the lines are in the tail buffer, what you probably want
> is this:
>
> http://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation
> .html
>
> Which does look to, finally, be available in the latest distros like
> RHEL 6.3.
>
> Brock
>
> On Fri, Sep 21, 2012 at 1:52 PM, Cochran, David M (Contractor)
> <Da...@bsee.gov> wrote:
>>
>> Is there a way to automatically flush an agent/source tailing a file
>> to the sink even if the buffer is not yet full every xx seconds?
>>
>> Maybe that's not worded quite right...  example
>>
>> Tailing a log file sending to File_roll sink... works like a charm,
>> however if activity stops, there are still a number of lines not sent
>> to the sink, apparently waiting for the buffer to fill up.  This can
>> be an issue for me as I want to have a script reviewing the logs on
>> the sink for errors and such... but if something goes sideways and is
>> recorded in the last xx lines not yet sent they could go undetected
>> for a long period of time of not written to the sink.
>>
>> In the case I'm looking at right now, the log I'm tailing has 10 lines
>
>> that have not been sent to the sink, since it's Friday afternoon there
>
>> is little activity, actually none in the last 3 hours.  Taken a step
>> further if the app crashed and only wrote out 5 lines calling for
>> help, they could go undetected for a long time.  Anyway to flush any
>> standing events to sink every 30 seconds or so?
>>
>> Thanks,
>> Dave
>>
>>
>
>
>
> --
> Apache MRUnit - Unit testing MapReduce -
> http://incubator.apache.org/mrunit/



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/

RE: Event flushing?

Posted by "Cochran, David M (Contractor)" <Da...@bsee.gov>.
Perhaps my explanation was unclear.  Flume is tailing a log file on the
app server (sinking to another box (FILE_ROLL)) .  I'm manually tailing
both the log file on the app server and the output file on the sink
server.  The App server log has 10 lines of entries that have yet to be
written at the sink side, 3+ hours has elapsed since the source log was
updated.  Now, if I echo another dozen or so lines to the end of the
source log, all the lines that were waiting and (some or all) newly
added lines will appear at the sink.  Wash, Rinse, Repeat.  

I'm not sure where the last few lines are sitting that need to be
sent/written out, but in limbo seems bad (at least from my perspective).

I perhaps wrongly assumed they are sitting in some sort of buffer/bucket
that is waiting to be full before sending.   If this is the case, then
would periodically checking to see if there is data waiting to be
committed even if the bucket is not full seem like a good idea?  


Dave




-----Original Message-----
From: Brock Noland [mailto:brock@cloudera.com] 
Sent: Friday, September 21, 2012 2:19 PM
To: user@flume.apache.org
Subject: Re: Event flushing?

If you are sure the lines are in the tail buffer, what you probably want
is this:

http://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation
.html

Which does look to, finally, be available in the latest distros like
RHEL 6.3.

Brock

On Fri, Sep 21, 2012 at 1:52 PM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
>
> Is there a way to automatically flush an agent/source tailing a file 
> to the sink even if the buffer is not yet full every xx seconds?
>
> Maybe that's not worded quite right...  example
>
> Tailing a log file sending to File_roll sink... works like a charm, 
> however if activity stops, there are still a number of lines not sent 
> to the sink, apparently waiting for the buffer to fill up.  This can 
> be an issue for me as I want to have a script reviewing the logs on 
> the sink for errors and such... but if something goes sideways and is 
> recorded in the last xx lines not yet sent they could go undetected 
> for a long period of time of not written to the sink.
>
> In the case I'm looking at right now, the log I'm tailing has 10 lines

> that have not been sent to the sink, since it's Friday afternoon there

> is little activity, actually none in the last 3 hours.  Taken a step 
> further if the app crashed and only wrote out 5 lines calling for 
> help, they could go undetected for a long time.  Anyway to flush any 
> standing events to sink every 30 seconds or so?
>
> Thanks,
> Dave
>
>



--
Apache MRUnit - Unit testing MapReduce -
http://incubator.apache.org/mrunit/

Re: Event flushing?

Posted by Brock Noland <br...@cloudera.com>.
If you are sure the lines are in the tail buffer, what you probably
want is this:

http://www.gnu.org/software/coreutils/manual/html_node/stdbuf-invocation.html

Which does look to, finally, be available in the latest distros like RHEL 6.3.

Brock

On Fri, Sep 21, 2012 at 1:52 PM, Cochran, David M (Contractor)
<Da...@bsee.gov> wrote:
>
> Is there a way to automatically flush an agent/source tailing a file to the
> sink even if the buffer is not yet full every xx seconds?
>
> Maybe that's not worded quite right...  example
>
> Tailing a log file sending to File_roll sink... works like a charm, however
> if activity stops, there are still a number of lines not sent to the sink,
> apparently waiting for the buffer to fill up.  This can be an issue for me
> as I want to have a script reviewing the logs on the sink for errors and
> such... but if something goes sideways and is recorded in the last xx lines
> not yet sent they could go undetected for a long period of time of not
> written to the sink.
>
> In the case I'm looking at right now, the log I'm tailing has 10 lines that
> have not been sent to the sink, since it's Friday afternoon there is little
> activity, actually none in the last 3 hours.  Taken a step further if the
> app crashed and only wrote out 5 lines calling for help, they could go
> undetected for a long time.  Anyway to flush any standing events to sink
> every 30 seconds or so?
>
> Thanks,
> Dave
>
>



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/