You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Scott Whitecross <sw...@gmail.com> on 2010/08/11 17:54:05 UTC

Listing Hadoop Job History Statistics

Hi -

What's the best way to list and query information on Hadoop job histories?
 For example, I'd like to see the job names from the past week against a
Hadoop cluster I'm using.   I don't see an API call or a way through the
command line to pull the information.  Is the best way writing a quick
script to process the job history files?

Thanks.
Scott

Re: Listing Hadoop Job History Statistics

Posted by Ranjit Mathew <ra...@yahoo-inc.com>.
[BCC-ing "general" - again.]

On Tuesday 17 August 2010 07:36 AM, Scott Whitecross wrote:
> Thanks for the answers Doug and Arun.   I'm assuming the job-history files
> mentioned are in ./hadoop-0.20/logs/history/done/.  The files look like they
> were serialized by a class in Hadoop?  (If I can read the files back into
> the appropriate class, and then dump them out into a custom format, that'd
> be great.)

Rumen (src/tools/org/apache/hadoop/tools/rumen/) parses Job History files
and creates JSON files that can be either be loaded independently, or via
the API provided by Rumen itself. As an added benefit, it abstracts away
the differences between the 0.20.xx format and the Avro-based format used
in trunk.

There is not much documentation on Rumen right now, but MAPREDUCE-1918
(https://issues.apache.org/jira/browse/MAPREDUCE-1918) attempts to fix
that.

HTH,
Ranjit

> On Thu, Aug 12, 2010 at 12:52 AM, Arun C Murthy<ac...@yahoo-inc.com>  wrote:
>
>> Moving to mapreduce-user@, bcc general@.
>>
>> There isn't a direct way. One possible option is just use the per-job
>> job-history file which is on HDFS (See
>> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoringfor info on job-history).
>>
>> Hope that helps.
>>
>> Arun
>>
>>
>> On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:
>>
>>   Hi -
>>>
>>> What's the best way to list and query information on Hadoop job histories?
>>> For example, I'd like to see the job names from the past week against a
>>> Hadoop cluster I'm using.   I don't see an API call or a way through the
>>> command line to pull the information.  Is the best way writing a quick
>>> script to process the job history files?
>>>
>>> Thanks.
>>> Scott
>>>
>>
>>


Re: Listing Hadoop Job History Statistics

Posted by Ranjit Mathew <ra...@yahoo-inc.com>.
[BCC-ing "general" - again.]

On Tuesday 17 August 2010 07:36 AM, Scott Whitecross wrote:
> Thanks for the answers Doug and Arun.   I'm assuming the job-history files
> mentioned are in ./hadoop-0.20/logs/history/done/.  The files look like they
> were serialized by a class in Hadoop?  (If I can read the files back into
> the appropriate class, and then dump them out into a custom format, that'd
> be great.)

Rumen (src/tools/org/apache/hadoop/tools/rumen/) parses Job History files
and creates JSON files that can be either be loaded independently, or via
the API provided by Rumen itself. As an added benefit, it abstracts away
the differences between the 0.20.xx format and the Avro-based format used
in trunk.

There is not much documentation on Rumen right now, but MAPREDUCE-1918
(https://issues.apache.org/jira/browse/MAPREDUCE-1918) attempts to fix
that.

HTH,
Ranjit

> On Thu, Aug 12, 2010 at 12:52 AM, Arun C Murthy<ac...@yahoo-inc.com>  wrote:
>
>> Moving to mapreduce-user@, bcc general@.
>>
>> There isn't a direct way. One possible option is just use the per-job
>> job-history file which is on HDFS (See
>> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoringfor info on job-history).
>>
>> Hope that helps.
>>
>> Arun
>>
>>
>> On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:
>>
>>   Hi -
>>>
>>> What's the best way to list and query information on Hadoop job histories?
>>> For example, I'd like to see the job names from the past week against a
>>> Hadoop cluster I'm using.   I don't see an API call or a way through the
>>> command line to pull the information.  Is the best way writing a quick
>>> script to process the job history files?
>>>
>>> Thanks.
>>> Scott
>>>
>>
>>


Re: Listing Hadoop Job History Statistics

Posted by Scott Whitecross <sw...@gmail.com>.
Thanks for the answers Doug and Arun.   I'm assuming the job-history files
mentioned are in ./hadoop-0.20/logs/history/done/.  The files look like they
were serialized by a class in Hadoop?  (If I can read the files back into
the appropriate class, and then dump them out into a custom format, that'd
be great.)

Thanks.



On Thu, Aug 12, 2010 at 12:52 AM, Arun C Murthy <ac...@yahoo-inc.com> wrote:

> Moving to mapreduce-user@, bcc general@.
>
> There isn't a direct way. One possible option is just use the per-job
> job-history file which is on HDFS (See
> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoringfor info on job-history).
>
> Hope that helps.
>
> Arun
>
>
> On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:
>
>  Hi -
>>
>> What's the best way to list and query information on Hadoop job histories?
>> For example, I'd like to see the job names from the past week against a
>> Hadoop cluster I'm using.   I don't see an API call or a way through the
>> command line to pull the information.  Is the best way writing a quick
>> script to process the job history files?
>>
>> Thanks.
>> Scott
>>
>
>

Re: Listing Hadoop Job History Statistics

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Moving to mapreduce-user@, bcc general@.

There isn't a direct way. One possible option is just use the per-job  
job-history file which is on HDFS (See http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoring 
  for info on job-history).

Hope that helps.

Arun

On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:

> Hi -
>
> What's the best way to list and query information on Hadoop job  
> histories?
> For example, I'd like to see the job names from the past week  
> against a
> Hadoop cluster I'm using.   I don't see an API call or a way through  
> the
> command line to pull the information.  Is the best way writing a quick
> script to process the job history files?
>
> Thanks.
> Scott


Re: Listing Hadoop Job History Statistics

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
Moving to mapreduce-user@, bcc general@.

There isn't a direct way. One possible option is just use the per-job  
job-history file which is on HDFS (See http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoring 
  for info on job-history).

Hope that helps.

Arun

On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:

> Hi -
>
> What's the best way to list and query information on Hadoop job  
> histories?
> For example, I'd like to see the job names from the past week  
> against a
> Hadoop cluster I'm using.   I don't see an API call or a way through  
> the
> command line to pull the information.  Is the best way writing a quick
> script to process the job history files?
>
> Thanks.
> Scott


Re: Listing Hadoop Job History Statistics

Posted by Doug Balog <do...@dugos.com>.
I don't know if this is the best way, but this is how I do it.

Configuration  conf = new Configuration();
JobClient jobClient = new JobClient(new InetSocketAddress("jobTracker",9001),conf);
jobClient.setConf(conf); // Bug in constructor, doesn't set conf.

 for(JobStatus js: jobClient.getAllJobs()){
    // We only care about completed jobs.
                if(!js.isJobComplete()){
                    continue;
                } 
                // Do stuff on jobStatus.
               :
	       :
 }

You can also scrape info from http://jobtracker:50030/jobhistory.jsp

Or read it from the job's outputDir/_log/ directory.

Cheers,

Doug


On Aug 11, 2010, at 11:54 AM, Scott Whitecross wrote:

> Hi -
> 
> What's the best way to list and query information on Hadoop job histories?
> For example, I'd like to see the job names from the past week against a
> Hadoop cluster I'm using.   I don't see an API call or a way through the
> command line to pull the information.  Is the best way writing a quick
> script to process the job history files?
> 
> Thanks.
> Scott