You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Nathan Marz <na...@rapleaf.com> on 2008/11/18 23:37:56 UTC

What do you do with task logs?

We find that after about 400 to 500 jobs run in succession on our  
Hadoop cluster, the disk space on each machine is quickly used up by  
logs for all the tasks. What do people do to manage these logs? Does  
Hadoop have anything built in for managing them? Or do we have to  
delete/move the logs with a home-cooked method?

Thanks,
Nathan Marz
Rapleaf

Re: What do you do with task logs?

Posted by Edward Capriolo <ed...@gmail.com>.

We just setup a log4j server. This takes the logs off the cluster.
Plus you get all the benefits of log4j

http://timarcher.com/?q=node/10

Re: What do you do with task logs?

Posted by Alex Loddengaard <al...@cloudera.com>.

You could take a look at Chukwa, which essentially collects and drops your
logs to HDFS:
<http://wiki.apache.org/hadoop/Chukwa>

The last time I tried to play with Chukwa, it wasn't in a state to be played
with yet.  If that's still the case, then you can use Scribe to collect all
of your logs in a single place, and then create a quick Python script to
persist these logs to HDFS.  Learn more about Scribe here:

<
http://www.cloudera.com/blog/2008/11/02/configuring-and-using-scribe-for-hadoop-log-collection/
>

Alex

On Tue, Nov 18, 2008 at 2:37 PM, Nathan Marz <na...@rapleaf.com> wrote:

> We find that after about 400 to 500 jobs run in succession on our Hadoop
> cluster, the disk space on each machine is quickly used up by logs for all
> the tasks. What do people do to manage these logs? Does Hadoop have anything
> built in for managing them? Or do we have to delete/move the logs with a
> home-cooked method?
>
> Thanks,
> Nathan Marz
> Rapleaf
>