You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Bejoy KS <be...@gmail.com> on 2011/09/23 09:52:36 UTC

Maintaining map reduce job logs - The best practices

Hi All
             I do have a query here on maintaining Hadoop map-reduce logs.
In default the logs appear in respective task tracker nodes which you can
easily drill down from the job tracker web UI at times of any failure.(Which
I was following till now) . Now I need to get into the next level to manage
the logs corresponding to individual  jobs. In my log I'm dumping some key
parameters with respect to my business which could be used for business
level debugging/analysis at time in the future if required . For this
purpose, I need a central log file corresponding to a job. (not many files,
ie one per task tracker because as the cluster grows the no of log files
corresponding to a job also increases). A single point of reference makes
things handy for  analysis by any business folks .
    I think it would be a generic requirement of any enterprise application
to manage  and archive the logs of each job execution. Hence definitely
there would be best practices and standards identified and maintained by
most of the core Hadoop enterprise users. Could you please help me out by
sharing some of the better options for log management for Hadoop map-reduce
logs. It could greatly help me choose the best practice that suit my
environment and application needs.

Thank You

Regards
Bejoy.K.S

Re: Maintaining map reduce job logs - The best practices

Posted by Mathias Herberts <ma...@gmail.com>.

> You can find the job specific logs in two places. The first one is in the hdfs ouput directory. The second place is under $HADOOP_HOME/logs/history ($HADOOP_HOME/logs/history/done)
>
> Both these paces have the config file and the job logs for each submited job.

Those logs in 'history/done' will get discarded after an
unconfigurable (as of yet) delay set to 30 days IIRC.

Our strategy is to move those logs daily in 'history/archive' which
won't get wiped.

Re: Maintaining map reduce job logs - The best practices

Posted by be...@gmail.com.

Great!.. Thanks Raj and Mathias
Just a clarification query on top of my question.
I wanna log some information of my processing/data logged into my log files. 
I'm planning to log it by LOG.debug() , if I do so in my mapper or reducer it'd be availabe under HADOOP_HOME/logs/history dir, right?
Second question, ie once a job is executed, does the logs from all tasks trackers get dumbed to HADOOP_HOME/logs/history dir in name node/job tracker?
Third question is how do I enable DEBUG mode of logger? Or is it enabled in default. If not what is the logger mode enabled default in hadoop?

Thanks a lot folks.

Regards
Bejoy K S

-----Original Message-----
From: Raj Vishwanathan <ra...@yahoo.com>
Date: Fri, 23 Sep 2011 06:10:41 
To: common-user@hadoop.apache.org<co...@hadoop.apache.org>
Reply-To: common-user@hadoop.apache.org
Cc: common-user@hadoop.apache.org<co...@hadoop.apache.org>; mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Maintaining map reduce job logs - The best practices

Bejoy

You can find the job specific logs in two places. The first one is in the hdfs ouput directory. The second place is under $HADOOP_HOME/logs/history ($HADOOP_HOME/logs/history/done)

Both these paces have the config file and the job logs for each submited job. 

Sent from my iPad
Please excuse the typos. 

On Sep 23, 2011, at 12:52 AM, Bejoy KS <be...@gmail.com> wrote:

> Hi All
>             I do have a query here on maintaining Hadoop map-reduce logs.
> In default the logs appear in respective task tracker nodes which you can
> easily drill down from the job tracker web UI at times of any failure.(Which
> I was following till now) . Now I need to get into the next level to manage
> the logs corresponding to individual  jobs. In my log I'm dumping some key
> parameters with respect to my business which could be used for business
> level debugging/analysis at time in the future if required . For this
> purpose, I need a central log file corresponding to a job. (not many files,
> ie one per task tracker because as the cluster grows the no of log files
> corresponding to a job also increases). A single point of reference makes
> things handy for  analysis by any business folks .
>    I think it would be a generic requirement of any enterprise application
> to manage  and archive the logs of each job execution. Hence definitely
> there would be best practices and standards identified and maintained by
> most of the core Hadoop enterprise users. Could you please help me out by
> sharing some of the better options for log management for Hadoop map-reduce
> logs. It could greatly help me choose the best practice that suit my
> environment and application needs.
> 
> Thank You
> 
> Regards
> Bejoy.K.S

Re: Maintaining map reduce job logs - The best practices

Posted by be...@gmail.com.

Great!.. Thanks Raj and Mathias
Just a clarification query on top of my question.
I wanna log some information of my processing/data logged into my log files. 
I'm planning to log it by LOG.debug() , if I do so in my mapper or reducer it'd be availabe under HADOOP_HOME/logs/history dir, right?
Second question, ie once a job is executed, does the logs from all tasks trackers get dumbed to HADOOP_HOME/logs/history dir in name node/job tracker?
Third question is how do I enable DEBUG mode of logger? Or is it enabled in default. If not what is the logger mode enabled default in hadoop?

Thanks a lot folks.

Regards
Bejoy K S

-----Original Message-----
From: Raj Vishwanathan <ra...@yahoo.com>
Date: Fri, 23 Sep 2011 06:10:41 
To: common-user@hadoop.apache.org<co...@hadoop.apache.org>
Reply-To: common-user@hadoop.apache.org
Cc: common-user@hadoop.apache.org<co...@hadoop.apache.org>; mapreduce-user@hadoop.apache.org<ma...@hadoop.apache.org>
Subject: Re: Maintaining map reduce job logs - The best practices

Bejoy

You can find the job specific logs in two places. The first one is in the hdfs ouput directory. The second place is under $HADOOP_HOME/logs/history ($HADOOP_HOME/logs/history/done)

Both these paces have the config file and the job logs for each submited job. 

Sent from my iPad
Please excuse the typos. 

On Sep 23, 2011, at 12:52 AM, Bejoy KS <be...@gmail.com> wrote:

> Hi All
>             I do have a query here on maintaining Hadoop map-reduce logs.
> In default the logs appear in respective task tracker nodes which you can
> easily drill down from the job tracker web UI at times of any failure.(Which
> I was following till now) . Now I need to get into the next level to manage
> the logs corresponding to individual  jobs. In my log I'm dumping some key
> parameters with respect to my business which could be used for business
> level debugging/analysis at time in the future if required . For this
> purpose, I need a central log file corresponding to a job. (not many files,
> ie one per task tracker because as the cluster grows the no of log files
> corresponding to a job also increases). A single point of reference makes
> things handy for  analysis by any business folks .
>    I think it would be a generic requirement of any enterprise application
> to manage  and archive the logs of each job execution. Hence definitely
> there would be best practices and standards identified and maintained by
> most of the core Hadoop enterprise users. Could you please help me out by
> sharing some of the better options for log management for Hadoop map-reduce
> logs. It could greatly help me choose the best practice that suit my
> environment and application needs.
> 
> Thank You
> 
> Regards
> Bejoy.K.S

Re: Maintaining map reduce job logs - The best practices

Posted by Raj Vishwanathan <ra...@yahoo.com>.

Bejoy

You can find the job specific logs in two places. The first one is in the hdfs ouput directory. The second place is under $HADOOP_HOME/logs/history ($HADOOP_HOME/logs/history/done)

Both these paces have the config file and the job logs for each submited job. 


Sent from my iPad
Please excuse the typos. 

On Sep 23, 2011, at 12:52 AM, Bejoy KS <be...@gmail.com> wrote:

> Hi All
>             I do have a query here on maintaining Hadoop map-reduce logs.
> In default the logs appear in respective task tracker nodes which you can
> easily drill down from the job tracker web UI at times of any failure.(Which
> I was following till now) . Now I need to get into the next level to manage
> the logs corresponding to individual  jobs. In my log I'm dumping some key
> parameters with respect to my business which could be used for business
> level debugging/analysis at time in the future if required . For this
> purpose, I need a central log file corresponding to a job. (not many files,
> ie one per task tracker because as the cluster grows the no of log files
> corresponding to a job also increases). A single point of reference makes
> things handy for  analysis by any business folks .
>    I think it would be a generic requirement of any enterprise application
> to manage  and archive the logs of each job execution. Hence definitely
> there would be best practices and standards identified and maintained by
> most of the core Hadoop enterprise users. Could you please help me out by
> sharing some of the better options for log management for Hadoop map-reduce
> logs. It could greatly help me choose the best practice that suit my
> environment and application needs.
> 
> Thank You
> 
> Regards
> Bejoy.K.S