You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flume.apache.org by "Cochran, David" <da...@bsee.gov> on 2012/12/21 21:26:50 UTC

post-processing

just had a thought... before I turn this script up and make a mess of
things I figured I'd ask the group...

I'm running FLUME 1.3 running using FILE_ROLL at the sink.... the 'live in
use' files are being periodically scanned for key events while still "live'
and being appending to by Flume... no problems there as they are just being
read....

now the interesting part, I also need to do a little processing of the
stored logs (using sed) to insert a couple pieces of data into each line
(if it doesn't already exist) before my log scanner process does it's thing.

I'm not sure what the odds are of this NOT totally hosing the flume
process/data will be...maybe recognizes the file is in use and waits? The
files are processed by sed pretty quickly ( ~15 secs) as they are rotated
daily.

Has anyone else tried this yet or have any insight as to how Flume might
react before I attempt to make bit soup?

Thanks,
-Dave

Re: post-processing

Posted by Brock Noland <br...@cloudera.com>.

I wouldn't modify the files while flume is also modifying them. It
might work but also might be a complete mess. If you need to modify
the events before being written interceptors are the correct solution.
After the file is done from a flume perspective, modify all you wish!

On Fri, Dec 21, 2012 at 2:26 PM, Cochran, David <da...@bsee.gov> wrote:
> just had a thought... before I turn this script up and make a mess of things
> I figured I'd ask the group...
>
> I'm running FLUME 1.3 running using FILE_ROLL at the sink.... the 'live in
> use' files are being periodically scanned for key events while still "live'
> and being appending to by Flume... no problems there as they are just being
> read....
>
> now the interesting part, I also need to do a little processing of the
> stored logs (using sed) to insert a couple pieces of data into each line (if
> it doesn't already exist) before my log scanner process does it's thing.
>
> I'm not sure what the odds are of this NOT totally hosing the flume
> process/data will be...maybe recognizes the file is in use and waits? The
> files are processed by sed pretty quickly ( ~15 secs) as they are rotated
> daily.
>
> Has anyone else tried this yet or have any insight as to how Flume might
> react before I attempt to make bit soup?
>
> Thanks,
> -Dave



-- 
Apache MRUnit - Unit testing MapReduce - http://incubator.apache.org/mrunit/