You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Benyi Wang <be...@gmail.com> on 2011/09/21 20:01:00 UTC

How to get hadoop job information effectively?

I'm working a project to collect MapReduce job information on an application
level. For example, a DW ETL process may involves several MapReduce jobs, we
want to have a dashboard to show the progress of those jobs for the specific
ETL process.

JobStatus does not provide all information like JobTracker web
page. JobInProgress is used in JobTracker and JobHistory and it is in
JobTracker memory, and seem not exposed to the client side.

The current method I am using is to check history log files and job conf XML
file to extract those information like jobdetailhistory.jsp and
jobhistory.jsp.

Is there a better way to collect the information like JobInProgress?

Thanks.

Re: How to get hadoop job information effectively?

Posted by Robert Evans <ev...@yahoo-inc.com>.

Not that I know of.  We scrape web pages which is a horrible thing to do.  There is a JIRA to add in some web service APIs to expose this type of information, but it is not going to be available for a while.

--Bobby Evans

On 9/21/11 1:01 PM, "Benyi Wang" <be...@gmail.com> wrote:

I'm working a project to collect MapReduce job information on an application
level. For example, a DW ETL process may involves several MapReduce jobs, we
want to have a dashboard to show the progress of those jobs for the specific
ETL process.

JobStatus does not provide all information like JobTracker web
page. JobInProgress is used in JobTracker and JobHistory and it is in
JobTracker memory, and seem not exposed to the client side.

The current method I am using is to check history log files and job conf XML
file to extract those information like jobdetailhistory.jsp and
jobhistory.jsp.

Is there a better way to collect the information like JobInProgress?

Thanks.