You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Nathan Marz <na...@rapleaf.com> on 2008/11/18 23:37:56 UTC
What do you do with task logs?
We find that after about 400 to 500 jobs run in succession on our
Hadoop cluster, the disk space on each machine is quickly used up by
logs for all the tasks. What do people do to manage these logs? Does
Hadoop have anything built in for managing them? Or do we have to
delete/move the logs with a home-cooked method?
Thanks,
Nathan Marz
Rapleaf
Re: What do you do with task logs?
Posted by Edward Capriolo <ed...@gmail.com>.
We just setup a log4j server. This takes the logs off the cluster.
Plus you get all the benefits of log4j
http://timarcher.com/?q=node/10
Re: What do you do with task logs?
Posted by Alex Loddengaard <al...@cloudera.com>.
You could take a look at Chukwa, which essentially collects and drops your
logs to HDFS:
<http://wiki.apache.org/hadoop/Chukwa>
The last time I tried to play with Chukwa, it wasn't in a state to be played
with yet. If that's still the case, then you can use Scribe to collect all
of your logs in a single place, and then create a quick Python script to
persist these logs to HDFS. Learn more about Scribe here:
<
http://www.cloudera.com/blog/2008/11/02/configuring-and-using-scribe-for-hadoop-log-collection/
>
Alex
On Tue, Nov 18, 2008 at 2:37 PM, Nathan Marz <na...@rapleaf.com> wrote:
> We find that after about 400 to 500 jobs run in succession on our Hadoop
> cluster, the disk space on each machine is quickly used up by logs for all
> the tasks. What do people do to manage these logs? Does Hadoop have anything
> built in for managing them? Or do we have to delete/move the logs with a
> home-cooked method?
>
> Thanks,
> Nathan Marz
> Rapleaf
>