You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Bart Vandewoestyne <Ba...@telenet.be> on 2014/10/23 13:32:57 UTC

getting counters from specific hadoop jobs

Hello list,

I order to learn about Hadoop performance tuning, I am currently 
investigating the effect of certain Hadoop configuration parameters on 
certain Hadoop counters.  I would like to do something like the 
following (from the command line):

for some_config_parameter in set_of_config_values

   Step 1) run hadoop job with 'hadoop jar ....'

   Step 2) once job finished, get the value of one or more Hadoop 
counters of this job

I know that I can achieve step 2 with the -counter option of the mapred 
job command:

bart@sandy-quad-1:~$ mapred job -counter
Usage: CLI [-counter <job-id> <group-name> <counter-name>]

However, I need to specify a job-id here, and that is where I'm having 
trouble... I don't know an easy way to get the job-id from the hadoop 
job that I started in Step 1.  I also don't know of a way to specify a 
job-id myself in Step 1 so that I can use it later in Step 2.

I cannot imagine I'm the only one trying to run jobs and requesting some 
of the counters afterwards.  How is this typically solved?

Note that I'm looking for a command-line solution, something that is 
scriptable bash or so.

Thanks,
Bart

Re: getting counters from specific hadoop jobs

Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.

Hi Bart,

There are rest apis available using which list of apps/jobs and their
counters can be requested.

Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html

Thanks,
Mahesh


On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:

> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>

Re: getting counters from specific hadoop jobs

Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.

Hi Bart,

There are rest apis available using which list of apps/jobs and their
counters can be requested.

Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html

Thanks,
Mahesh


On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:

> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>

Re: getting counters from specific hadoop jobs

Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.

Hi Bart,

There are rest apis available using which list of apps/jobs and their
counters can be requested.

Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html

Thanks,
Mahesh


On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:

> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>

Re: getting counters from specific hadoop jobs

Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.

Hi Bart,

There are rest apis available using which list of apps/jobs and their
counters can be requested.

Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html

Thanks,
Mahesh


On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:

> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).

Good luck,
Thomas

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).

Good luck,
Thomas

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).

Good luck,
Thomas

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).

Good luck,
Thomas

Re: getting counters from specific hadoop jobs

Posted by Bart Vandewoestyne <Ba...@telenet.be>.

On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?

That was indeed one of the methods I was thinking of, but I cannot think 
of a reliable way of implementing it.

Suppose I start a job with hadoop jar, and I wait until it is finished 
and then use `mapred job -list all` to somehow find out the job-id of my 
job that just finished.  Then how do I know what line in the output of 
`mapred job -list all` corresponds to the job I executed?  Even if the 
job output list would be sorted by start time, then I cannot be sure 
that the last started job is mine because another user could have 
started another job after me...

A mechanism that would easily allow a user to get the job-id from a job 
that he just started, would be nice to have.  Doesn't this exist?

Maybe grepping through the output of `mapred job -history all` would be 
the best solution to get to the counter information?  Unfortunately, I 
currently cannot test this approach as I am experiencing the following 
error:

bart@sandy-quad-1:~$ mapred job -history all 
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at 
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize 
History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
	at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
	at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
	... 5 more

:-(

Kind regards,
Bart

Re: getting counters from specific hadoop jobs

Posted by Bart Vandewoestyne <Ba...@telenet.be>.

On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?

That was indeed one of the methods I was thinking of, but I cannot think 
of a reliable way of implementing it.

Suppose I start a job with hadoop jar, and I wait until it is finished 
and then use `mapred job -list all` to somehow find out the job-id of my 
job that just finished.  Then how do I know what line in the output of 
`mapred job -list all` corresponds to the job I executed?  Even if the 
job output list would be sorted by start time, then I cannot be sure 
that the last started job is mine because another user could have 
started another job after me...

A mechanism that would easily allow a user to get the job-id from a job 
that he just started, would be nice to have.  Doesn't this exist?

Maybe grepping through the output of `mapred job -history all` would be 
the best solution to get to the counter information?  Unfortunately, I 
currently cannot test this approach as I am experiencing the following 
error:

bart@sandy-quad-1:~$ mapred job -history all 
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at 
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize 
History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
	at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
	at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
	... 5 more

:-(

Kind regards,
Bart

Re: getting counters from specific hadoop jobs

Posted by Bart Vandewoestyne <Ba...@telenet.be>.

On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?

That was indeed one of the methods I was thinking of, but I cannot think 
of a reliable way of implementing it.

Suppose I start a job with hadoop jar, and I wait until it is finished 
and then use `mapred job -list all` to somehow find out the job-id of my 
job that just finished.  Then how do I know what line in the output of 
`mapred job -list all` corresponds to the job I executed?  Even if the 
job output list would be sorted by start time, then I cannot be sure 
that the last started job is mine because another user could have 
started another job after me...

A mechanism that would easily allow a user to get the job-id from a job 
that he just started, would be nice to have.  Doesn't this exist?

Maybe grepping through the output of `mapred job -history all` would be 
the best solution to get to the counter information?  Unfortunately, I 
currently cannot test this approach as I am experiencing the following 
error:

bart@sandy-quad-1:~$ mapred job -history all 
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at 
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize 
History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
	at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
	at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
	... 5 more

:-(

Kind regards,
Bart

Re: getting counters from specific hadoop jobs

Posted by Bart Vandewoestyne <Ba...@telenet.be>.

On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?

That was indeed one of the methods I was thinking of, but I cannot think 
of a reliable way of implementing it.

Suppose I start a job with hadoop jar, and I wait until it is finished 
and then use `mapred job -list all` to somehow find out the job-id of my 
job that just finished.  Then how do I know what line in the output of 
`mapred job -list all` corresponds to the job I executed?  Even if the 
job output list would be sorted by start time, then I cannot be sure 
that the last started job is mine because another user could have 
started another job after me...

A mechanism that would easily allow a user to get the job-id from a job 
that he just started, would be nice to have.  Doesn't this exist?

Maybe grepping through the output of `mapred job -history all` would be 
the best solution to get to the counter information?  Unfortunately, I 
currently cannot test this approach as I am experiencing the following 
error:

bart@sandy-quad-1:~$ mapred job -history all 
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at 
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize 
History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
	at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
	at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
	at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
	at 
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
	... 5 more

:-(

Kind regards,
Bart

Re: getting counters from specific hadoop jobs

Posted by Dieter De Witte <dr...@gmail.com>.

Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?

2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

Hi Bart,

Dieter beat me to it. An alternative would be grepping from the logs.

Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring

Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas

Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833

On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

Re: getting counters from specific hadoop jobs

Posted by Dieter De Witte <dr...@gmail.com>.

Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?

2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

Hi Bart,

Dieter beat me to it. An alternative would be grepping from the logs.

Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring

Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas

Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833

On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

Re: getting counters from specific hadoop jobs

Posted by Dieter De Witte <dr...@gmail.com>.

Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?

2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

Re: getting counters from specific hadoop jobs

Posted by Dieter De Witte <dr...@gmail.com>.

Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?

2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

Hi Bart,

Dieter beat me to it. An alternative would be grepping from the logs.

Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring

Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas

Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833

On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

Re: getting counters from specific hadoop jobs

Posted by Thomas Demoor <th...@amplidata.com>.

Hi Bart,

Dieter beat me to it. An alternative would be grepping from the logs.

Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring

Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas

Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833

On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>