You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Bart Vandewoestyne <Ba...@telenet.be> on 2014/10/23 13:32:57 UTC
getting counters from specific hadoop jobs
Hello list,
I order to learn about Hadoop performance tuning, I am currently
investigating the effect of certain Hadoop configuration parameters on
certain Hadoop counters. I would like to do something like the
following (from the command line):
for some_config_parameter in set_of_config_values
Step 1) run hadoop job with 'hadoop jar ....'
Step 2) once job finished, get the value of one or more Hadoop
counters of this job
I know that I can achieve step 2 with the -counter option of the mapred
job command:
bart@sandy-quad-1:~$ mapred job -counter
Usage: CLI [-counter <job-id> <group-name> <counter-name>]
However, I need to specify a job-id here, and that is where I'm having
trouble... I don't know an easy way to get the job-id from the hadoop
job that I started in Step 1. I also don't know of a way to specify a
job-id myself in Step 1 so that I can use it later in Step 2.
I cannot imagine I'm the only one trying to run jobs and requesting some
of the counters afterwards. How is this typically solved?
Note that I'm looking for a command-line solution, something that is
scriptable bash or so.
Thanks,
Bart
Re: getting counters from specific hadoop jobs
Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.
Hi Bart,
There are rest apis available using which list of apps/jobs and their
counters can be requested.
Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html
Thanks,
Mahesh
On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:
> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>
Re: getting counters from specific hadoop jobs
Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.
Hi Bart,
There are rest apis available using which list of apps/jobs and their
counters can be requested.
Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html
Thanks,
Mahesh
On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:
> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>
Re: getting counters from specific hadoop jobs
Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.
Hi Bart,
There are rest apis available using which list of apps/jobs and their
counters can be requested.
Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html
Thanks,
Mahesh
On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:
> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>
Re: getting counters from specific hadoop jobs
Posted by Mahesh Kumar Vasanthu Somashekar <mv...@pivotal.io>.
Hi Bart,
There are rest apis available using which list of apps/jobs and their
counters can be requested.
Check below links,
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/MapredAppMasterRest.html
https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-site/HistoryServerRest.html
Thanks,
Mahesh
On Thu, Oct 23, 2014 at 8:43 AM, Thomas Demoor <th...@amplidata.com>
wrote:
> In the log files of your application you will find your job/application id
> (I guess thr message has Log.level=INFO so logging needs to be at least
> that high).
>
> Good luck,
> Thomas
>
>
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).
Good luck,
Thomas
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).
Good luck,
Thomas
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).
Good luck,
Thomas
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
In the log files of your application you will find your job/application id
(I guess thr message has Log.level=INFO so logging needs to be at least
that high).
Good luck,
Thomas
Re: getting counters from specific hadoop jobs
Posted by Bart Vandewoestyne <Ba...@telenet.be>.
On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?
That was indeed one of the methods I was thinking of, but I cannot think
of a reliable way of implementing it.
Suppose I start a job with hadoop jar, and I wait until it is finished
and then use `mapred job -list all` to somehow find out the job-id of my
job that just finished. Then how do I know what line in the output of
`mapred job -list all` corresponds to the job I executed? Even if the
job output list would be sorted by start time, then I cannot be sure
that the last started job is mine because another user could have
started another job after me...
A mechanism that would easily allow a user to get the job-id from a job
that he just started, would be nice to have. Doesn't this exist?
Maybe grepping through the output of `mapred job -history all` would be
the best solution to get to the counter information? Unfortunately, I
currently cannot test this approach as I am experiencing the following
error:
bart@sandy-quad-1:~$ mapred job -history all
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize
History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
... 5 more
:-(
Kind regards,
Bart
Re: getting counters from specific hadoop jobs
Posted by Bart Vandewoestyne <Ba...@telenet.be>.
On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?
That was indeed one of the methods I was thinking of, but I cannot think
of a reliable way of implementing it.
Suppose I start a job with hadoop jar, and I wait until it is finished
and then use `mapred job -list all` to somehow find out the job-id of my
job that just finished. Then how do I know what line in the output of
`mapred job -list all` corresponds to the job I executed? Even if the
job output list would be sorted by start time, then I cannot be sure
that the last started job is mine because another user could have
started another job after me...
A mechanism that would easily allow a user to get the job-id from a job
that he just started, would be nice to have. Doesn't this exist?
Maybe grepping through the output of `mapred job -history all` would be
the best solution to get to the counter information? Unfortunately, I
currently cannot test this approach as I am experiencing the following
error:
bart@sandy-quad-1:~$ mapred job -history all
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize
History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
... 5 more
:-(
Kind regards,
Bart
Re: getting counters from specific hadoop jobs
Posted by Bart Vandewoestyne <Ba...@telenet.be>.
On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?
That was indeed one of the methods I was thinking of, but I cannot think
of a reliable way of implementing it.
Suppose I start a job with hadoop jar, and I wait until it is finished
and then use `mapred job -list all` to somehow find out the job-id of my
job that just finished. Then how do I know what line in the output of
`mapred job -list all` corresponds to the job I executed? Even if the
job output list would be sorted by start time, then I cannot be sure
that the last started job is mine because another user could have
started another job after me...
A mechanism that would easily allow a user to get the job-id from a job
that he just started, would be nice to have. Doesn't this exist?
Maybe grepping through the output of `mapred job -history all` would be
the best solution to get to the counter information? Unfortunately, I
currently cannot test this approach as I am experiencing the following
error:
bart@sandy-quad-1:~$ mapred job -history all
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize
History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
... 5 more
:-(
Kind regards,
Bart
Re: getting counters from specific hadoop jobs
Posted by Bart Vandewoestyne <Ba...@telenet.be>.
On 10/23/2014 02:56 PM, Dieter De Witte wrote:
> Maybe you could use job -list or job -history to get a list of the
> jobids and extract it from there?
That was indeed one of the methods I was thinking of, but I cannot think
of a reliable way of implementing it.
Suppose I start a job with hadoop jar, and I wait until it is finished
and then use `mapred job -list all` to somehow find out the job-id of my
job that just finished. Then how do I know what line in the output of
`mapred job -list all` corresponds to the job I executed? Even if the
job output list would be sorted by start time, then I cannot be sure
that the last started job is mine because another user could have
started another job after me...
A mechanism that would easily allow a user to get the job-id from a job
that he just started, would be nice to have. Doesn't this exist?
Maybe grepping through the output of `mapred job -history all` would be
the best solution to get to the counter information? Unfortunately, I
currently cannot test this approach as I am experiencing the following
error:
bart@sandy-quad-1:~$ mapred job -history all
/user/bart/terasort/output/0050GB
14/10/23 16:03:12 INFO client.RMProxy: Connecting to ResourceManager at
sandy-quad-1.sslab.lan/192.168.35.75:8032
Ignore unrecognized file: 0050GB
Exception in thread "main" java.io.IOException: Unable to initialize
History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:90)
at org.apache.hadoop.mapreduce.tools.CLI.viewHistory(CLI.java:470)
at org.apache.hadoop.mapreduce.tools.CLI.run(CLI.java:313)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.mapred.JobClient.main(JobClient.java:1239)
Caused by: java.io.IOException: Unable to initialize History Viewer
at
org.apache.hadoop.mapreduce.jobhistory.HistoryViewer.<init>(HistoryViewer.java:84)
... 5 more
:-(
Kind regards,
Bart
Re: getting counters from specific hadoop jobs
Posted by Dieter De Witte <dr...@gmail.com>.
Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?
2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
Hi Bart,
Dieter beat me to it. An alternative would be grepping from the logs.
Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring
Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas
Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833
On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>
Re: getting counters from specific hadoop jobs
Posted by Dieter De Witte <dr...@gmail.com>.
Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?
2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
Hi Bart,
Dieter beat me to it. An alternative would be grepping from the logs.
Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring
Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas
Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833
On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>
Re: getting counters from specific hadoop jobs
Posted by Dieter De Witte <dr...@gmail.com>.
Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?
2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>
Re: getting counters from specific hadoop jobs
Posted by Dieter De Witte <dr...@gmail.com>.
Maybe you could use job -list or job -history to get a list of the jobids
and extract it from there?
2014-10-23 13:32 GMT+02:00 Bart Vandewoestyne <Bart.Vandewoestyne@telenet.be
>:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
Hi Bart,
Dieter beat me to it. An alternative would be grepping from the logs.
Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring
Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas
Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833
On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>
Re: getting counters from specific hadoop jobs
Posted by Thomas Demoor <th...@amplidata.com>.
Hi Bart,
Dieter beat me to it. An alternative would be grepping from the logs.
Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring
Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas
Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833
On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:
> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters. I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
> Step 1) run hadoop job with 'hadoop jar ....'
>
> Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1. I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards. How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>