You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Ariel Rabkin <as...@gmail.com> on 2009/10/05 19:36:06 UTC

Re: How best to collect userlogs (in a streaming world)

You might also look at Chukwa -- this was precisely the original
problem Chukwa was designed to solve, and we're pretty much there.
Chukwa is a particularly natural fit if you want your logs stored in
HDFS.

On Mon, Sep 28, 2009 at 2:18 PM, Dan Milstein <dm...@hubspot.com> wrote:
> Hadoop-folk,
>
> How have people gone about collecting debug/error log information from
> streaming jobs, in Hadoop?
>
> I'm clear that, if I write to stderr (and it's not a counter/status line),
> then it goes onto the node's local disk, in:
>
>  /var/log/hadoop/userlogs/<task atttempt>/stderr
>
> However, I'd really like to collect those in some central location, for
> processing.  Possibly via splunk (which we use right now), possibly some
> other means.
>
>  - Do people write a custom log4j appender?  (does log4j even control writes
> to that stderr file?  I can't tell -- it somewhat looks like no)
>
>  - Or, maybe write cron jobs that run on the slaves and periodically push
> logs somewhere?
>
>  - Are people outside of Facebook using scribe?
>
> Any ideas / experiences appreciated.
>
> Thanks,
> -Dan Milstein
>
>
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department