You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Patrick Datko <pa...@ymc.ch> on 2010/04/23 14:38:52 UTC

Hadoop Log Collection

Hey everyone,

i deal with hadoop since a few weeks to build up a cluster with hdfs. I
was looking for several Monitoring tools to observe my cluster and find
a good solution with ganglia+nagios. To complete the monitoring part of
the cluster, i am looking for an Log collection tool, which store the
log files of the nodes centralized. I have tested Chukwa and Facebook's
Scribe, but both are not that type of simple storing log files, in my
opinion they are too big, only for such a job. 

So i've thinking about writing an own LogCollector. I didn't want
something special. My idea is, to build a deamon, which could be
installed on every node in the cluster and onxml-file, which describes
which log files have to be collected. The daemon should collect, in
configured time interval, all needed log files and store them using the
Java API in HDFS.

This was just an idea for a simple LogCollector and it would cool if you
can give me some opinion about this or whether such a LogCollector
exits.

Kind regards,
Patrick 


Re: Hadoop Log Collection

Posted by Ariel Rabkin <as...@gmail.com>.
It should actually be straightforward to do this with Chukwa.  Chukwa
has a bunch of other pieces, but at its core, it does basically what
you describe.

The one complexity is that instead of storing each file separately,
Chukwa runs them together into larger sequence files.  This turns out
to be important if you want good filesystem performance or if you have
large data volumes or if you want to keep metadata telling you which
machine your file came from.

--Ari

On Fri, Apr 23, 2010 at 5:38 AM, Patrick Datko <pa...@ymc.ch> wrote:
> Hey everyone,
>
> i deal with hadoop since a few weeks to build up a cluster with hdfs. I
> was looking for several Monitoring tools to observe my cluster and find
> a good solution with ganglia+nagios. To complete the monitoring part of
> the cluster, i am looking for an Log collection tool, which store the
> log files of the nodes centralized. I have tested Chukwa and Facebook's
> Scribe, but both are not that type of simple storing log files, in my
> opinion they are too big, only for such a job.
>
> So i've thinking about writing an own LogCollector. I didn't want
> something special. My idea is, to build a deamon, which could be
> installed on every node in the cluster and onxml-file, which describes
> which log files have to be collected. The daemon should collect, in
> configured time interval, all needed log files and store them using the
> Java API in HDFS.
>
> This was just an idea for a simple LogCollector and it would cool if you
> can give me some opinion about this or whether such a LogCollector
> exits.
>
> Kind regards,
> Patrick
>
>



-- 
Ari Rabkin asrabkin@gmail.com
UC Berkeley Computer Science Department

Re: Hadoop Log Collection

Posted by Pierre ANCELOT <pi...@gmail.com>.
What we have done is to configure syslog to forward all logging to a log
server.
All logs are sent using UDP.
Works like a charm, we have a directory per node and for some logs, we have
all mixed in a file...
Check out syslog documentation...


On Fri, Apr 23, 2010 at 2:38 PM, Patrick Datko <pa...@ymc.ch> wrote:

> Hey everyone,
>
> i deal with hadoop since a few weeks to build up a cluster with hdfs. I
> was looking for several Monitoring tools to observe my cluster and find
> a good solution with ganglia+nagios. To complete the monitoring part of
> the cluster, i am looking for an Log collection tool, which store the
> log files of the nodes centralized. I have tested Chukwa and Facebook's
> Scribe, but both are not that type of simple storing log files, in my
> opinion they are too big, only for such a job.
>
> So i've thinking about writing an own LogCollector. I didn't want
> something special. My idea is, to build a deamon, which could be
> installed on every node in the cluster and onxml-file, which describes
> which log files have to be collected. The daemon should collect, in
> configured time interval, all needed log files and store them using the
> Java API in HDFS.
>
> This was just an idea for a simple LogCollector and it would cool if you
> can give me some opinion about this or whether such a LogCollector
> exits.
>
> Kind regards,
> Patrick
>
>


-- 
http://www.neko-consulting.com
Ego sum quis ego servo
"Je suis ce que je protège"
"I am what I protect"