You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ambari.apache.org by Aaron Cody <ac...@hexiscyber.com> on 2014/01/17 22:59:56 UTC

Jobs view .. how to hook into it....

hello
I¹m looking at integrating my own process into the Ambari ŒJobs¹ view Š and
I can see how the web side of things works .. i.e. the view makes REST calls
to the server which in turn results in a query to postgres to get the job
stats Š but what is not so clear is how those job/task stats get into
postgres in the first placeŠ.
Q: for example, with MapReduce .. is Hadoop/JobTracker somehow inserting the
job/task info into postgres directly? Or is there some other mechanism in
Ambari that is listening for map reduce jobs/tasks to start/finish?

any hints on where to look in the source tree would be greatly appreciated
TIA



Re: Jobs view .. how to hook into it....

Posted by Billie Rinaldi <bi...@gmail.com>.
On Wed, Jan 29, 2014 at 10:29 AM, Aaron Cody <ac...@hexiscyber.com> wrote:

> yes both of those things... and maybe a bit more explanation on how they
> were implemented for Hive/Pig ...
>

Let's say you have a workflow that consists of 3 MapReduce jobs.  Maybe the
workflow is a specific Hive query, or Pig script, or maybe you just have
your own script that kicks off the 3 jobs.   You run this workflow
repeatedly, and you want to be able to evaluate the relative performance of
different runs of the entire workflow as a whole -- maybe one of the jobs
is slow sometimes, but you don't know which one, or why.  To group together
the MR jobs for a particular run of the workflow, you assign each run a
unique ID, e.g. appname_run0001.  Then when you're configuring the MR jobs,
you add this ID to the job conf under the mapreduce.workflow.id property.
You probably actually have multiple types of workflows (like different Hive
queries or Pig scripts that you run repeatedly), so you can give each
workflow type a name (mapreduce.workflow.name) and use that to filter your
workflows in the web app.

Let's say in your 3 job workflow, job A runs first, then job B runs on the
output of job A, then job C uses the output of both A and B.  You can
capture these dependencies by using the adjacency properties.  Then the web
app can display the jobs in a DAG.  The following shows B and C depending
on A and C depending on B.  The last piece needed to make the DAG work is
that we have to know whether a particular MR job is an instance of A, B, or
C.  You specify this in the job conf by setting the
mapreduce.workflow.node.name property.  The job identifiers I'm using here
are single letters, but they could be anything.  Hive uses its internal
stage identifiers, and Pig uses some kind of counter.
conf.setStrings("mapreduce.workflow.adjacency.A", new String[]{"B", "C"});
conf.setStrings("mapreduce.workflow.adjacency.B", new String[]{"C"});

For Pig's implementation, look for mapreduce.workflow in this file:
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/tools/pigstats/mapreduce/MRScriptState.java

For Hive's implementation, look for mapreduce.workflow in this file:
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java
As well as the setWorkflowAdjacencies method in this file:
http://svn.apache.org/repos/asf/hive/trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java


> Also, the workflow.workflowcontext column ... looks like a blob of JSON
> which I guess ends up in some model in the web app? but how to construct
> it..? (the regex code in MapReduceHobHistoryUpdater.java is not exactly
> straightforward :)  )
>

A MapReduce workflow-producing app doesn't have to construct the object.
MapReduceHobHistoryUpdater does it for you based on the mapreduce.workflow
properties the app sets in the job configuration.  The regex it uses are
only stripping off escape characters added by JobHistory when it is logging
the information.  If you want to have a Java representation of the json, it
is a WorkflowContext object.  WorkflowContext is present in both
ambari-log4j and ambari-server.


> thanks
> A
>
> From: Billie Rinaldi <bi...@gmail.com>
> Reply-To: <us...@ambari.apache.org>
> Date: Wed, 29 Jan 2014 07:03:32 -0800
>
> To: <us...@ambari.apache.org>
> Subject: Re: Jobs view .. how to hook into it....
>
> Sure.  Which part is confusing?  The adjacencies?  Or why you would use it
> at all?
>
>
> On Tue, Jan 28, 2014 at 4:47 PM, Aaron Cody <ac...@hexiscyber.com> wrote:
>
>> thanks Billie - do you think you could go into a little more detail about
>> the workflow DAG stuff on the wiki? it's a little cryptic (to me anyway)  :)
>>
>> From: Billie Rinaldi <bi...@gmail.com>
>> Reply-To: <us...@ambari.apache.org>
>> Date: Mon, 20 Jan 2014 07:40:04 -0800
>> To: <us...@ambari.apache.org>
>> Subject: Re: Jobs view .. how to hook into it....
>>
>> In Hadoop 1 only, there is a log4j appender on the JobTracker/JobHistory
>> that inserts the data into postgres (or whichever db you have configured).
>> The code is in contrib/ambari-log4j.
>>
>> Billie
>>
>>
>> On Fri, Jan 17, 2014 at 1:59 PM, Aaron Cody <ac...@hexiscyber.com> wrote:
>>
>>> hello
>>> I'm looking at integrating my own process into the Ambari 'Jobs' view ...
>>> and I can see how the web side of things works .. i.e. the view makes REST
>>> calls to the server which in turn results in a query to postgres to get the
>>> job stats ... but what is not so clear is how those job/task stats get into
>>> postgres in the first place....
>>> Q: for example, with MapReduce .. is Hadoop/JobTracker somehow inserting
>>> the job/task info into postgres directly? Or is there some other mechanism
>>> in Ambari that is listening for map reduce jobs/tasks to start/finish?
>>>
>>> any hints on where to look in the source tree would be greatly
>>> appreciated
>>> TIA
>>>
>>
>>
>

Re: Jobs view .. how to hook into it....

Posted by Aaron Cody <ac...@hexiscyber.com>.
yes both of those thingsŠ and maybe a bit more explanation on how they were
implemented for Hive/Pig Š
Also, the workflow.workflowcontext column Š looks like a blob of JSON which
I guess ends up in some model in the web app? but how to construct it..?
(the regex code in MapReduceHobHistoryUpdater.java is not exactly
straightforward :)  )
thanks
A

From:  Billie Rinaldi <bi...@gmail.com>
Reply-To:  <us...@ambari.apache.org>
Date:  Wed, 29 Jan 2014 07:03:32 -0800
To:  <us...@ambari.apache.org>
Subject:  Re: Jobs view .. how to hook into it....

Sure.  Which part is confusing?  The adjacencies?  Or why you would use it
at all?


On Tue, Jan 28, 2014 at 4:47 PM, Aaron Cody <ac...@hexiscyber.com> wrote:
> thanks Billie - do you think you could go into a little more detail about the
> workflow DAG stuff on the wiki? it¹s a little cryptic (to me anyway)  :)
> 
> From:  Billie Rinaldi <bi...@gmail.com>
> Reply-To:  <us...@ambari.apache.org>
> Date:  Mon, 20 Jan 2014 07:40:04 -0800
> To:  <us...@ambari.apache.org>
> Subject:  Re: Jobs view .. how to hook into it....
> 
> In Hadoop 1 only, there is a log4j appender on the JobTracker/JobHistory that
> inserts the data into postgres (or whichever db you have configured).  The
> code is in contrib/ambari-log4j.
> 
> Billie
> 
> 
> On Fri, Jan 17, 2014 at 1:59 PM, Aaron Cody <ac...@hexiscyber.com> wrote:
>> hello
>> I¹m looking at integrating my own process into the Ambari ŒJobs¹ view Š and I
>> can see how the web side of things works .. i.e. the view makes REST calls to
>> the server which in turn results in a query to postgres to get the job stats
>> Š but what is not so clear is how those job/task stats get into postgres in
>> the first placeŠ.
>> Q: for example, with MapReduce .. is Hadoop/JobTracker somehow inserting the
>> job/task info into postgres directly? Or is there some other mechanism in
>> Ambari that is listening for map reduce jobs/tasks to start/finish?
>> 
>> any hints on where to look in the source tree would be greatly appreciated
>> TIA
> 




Re: Jobs view .. how to hook into it....

Posted by Billie Rinaldi <bi...@gmail.com>.
Sure.  Which part is confusing?  The adjacencies?  Or why you would use it
at all?


On Tue, Jan 28, 2014 at 4:47 PM, Aaron Cody <ac...@hexiscyber.com> wrote:

> thanks Billie - do you think you could go into a little more detail about
> the workflow DAG stuff on the wiki? it's a little cryptic (to me anyway)  :)
>
> From: Billie Rinaldi <bi...@gmail.com>
> Reply-To: <us...@ambari.apache.org>
> Date: Mon, 20 Jan 2014 07:40:04 -0800
> To: <us...@ambari.apache.org>
> Subject: Re: Jobs view .. how to hook into it....
>
> In Hadoop 1 only, there is a log4j appender on the JobTracker/JobHistory
> that inserts the data into postgres (or whichever db you have configured).
> The code is in contrib/ambari-log4j.
>
> Billie
>
>
> On Fri, Jan 17, 2014 at 1:59 PM, Aaron Cody <ac...@hexiscyber.com> wrote:
>
>> hello
>> I'm looking at integrating my own process into the Ambari 'Jobs' view ...
>> and I can see how the web side of things works .. i.e. the view makes REST
>> calls to the server which in turn results in a query to postgres to get the
>> job stats ... but what is not so clear is how those job/task stats get into
>> postgres in the first place....
>> Q: for example, with MapReduce .. is Hadoop/JobTracker somehow inserting
>> the job/task info into postgres directly? Or is there some other mechanism
>> in Ambari that is listening for map reduce jobs/tasks to start/finish?
>>
>> any hints on where to look in the source tree would be greatly appreciated
>> TIA
>>
>
>

Re: Jobs view .. how to hook into it....

Posted by Aaron Cody <ac...@hexiscyber.com>.
thanks Billie - do you think you could go into a little more detail about
the workflow DAG stuff on the wiki? it¹s a little cryptic (to me anyway)  :)

From:  Billie Rinaldi <bi...@gmail.com>
Reply-To:  <us...@ambari.apache.org>
Date:  Mon, 20 Jan 2014 07:40:04 -0800
To:  <us...@ambari.apache.org>
Subject:  Re: Jobs view .. how to hook into it....

In Hadoop 1 only, there is a log4j appender on the JobTracker/JobHistory
that inserts the data into postgres (or whichever db you have configured).
The code is in contrib/ambari-log4j.

Billie


On Fri, Jan 17, 2014 at 1:59 PM, Aaron Cody <ac...@hexiscyber.com> wrote:
> hello
> I¹m looking at integrating my own process into the Ambari ŒJobs¹ view Š and I
> can see how the web side of things works .. i.e. the view makes REST calls to
> the server which in turn results in a query to postgres to get the job stats Š
> but what is not so clear is how those job/task stats get into postgres in the
> first placeŠ. 
> Q: for example, with MapReduce .. is Hadoop/JobTracker somehow inserting the
> job/task info into postgres directly? Or is there some other mechanism in
> Ambari that is listening for map reduce jobs/tasks to start/finish?
> 
> any hints on where to look in the source tree would be greatly appreciated
> TIA




Re: Jobs view .. how to hook into it....

Posted by Billie Rinaldi <bi...@gmail.com>.
In Hadoop 1 only, there is a log4j appender on the JobTracker/JobHistory
that inserts the data into postgres (or whichever db you have configured).
The code is in contrib/ambari-log4j.

Billie


On Fri, Jan 17, 2014 at 1:59 PM, Aaron Cody <ac...@hexiscyber.com> wrote:

> hello
> I’m looking at integrating my own process into the Ambari ‘Jobs’ view …
> and I can see how the web side of things works .. i.e. the view makes REST
> calls to the server which in turn results in a query to postgres to get the
> job stats … but what is not so clear is how those job/task stats get into
> postgres in the first place….
> Q: for example, with MapReduce .. is Hadoop/JobTracker somehow inserting
> the job/task info into postgres directly? Or is there some other mechanism
> in Ambari that is listening for map reduce jobs/tasks to start/finish?
>
> any hints on where to look in the source tree would be greatly appreciated
> TIA
>