You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Lewis John Mcgibbney <le...@gmail.com> on 2012/10/30 23:36:07 UTC

How to Access a Mapper and Reducer Counters (cont'd???)

Hi All,

I am attempting to access the number of output records in a map task
and therefore turned to the o.a.h.mapreduce.Job#getCounters() API and
attempted to do something like this

String mapTaskOutputCounterName = MAP_OUTPUT_RECORDS;
...
Counters counter = currentJob.getCounters();
...
counter.getGroup(mapTaskOutputCounterName).size();

However this always gave me 0 when I attempted to push this value to
LOG output. I therefore searched on the list archives and came across
this rather interesting thread [0] which eventually leads to
MAPREDUCE-3520[1] highlighting the need for a new interface for
metrics to be exchanged between maps and reduces.

I need to be honest here and say that the integer value I am after is
rather trivial in its purpose (it compliments some simple logging
within Nutch 2.x) however it would be great if someone could provide
me with the code to obtain the correct counter group (e.g.
MAP_OUTPUT_RECORDS) from within the Job Counters.

Thank you very much in advance for any help which comes this way.

Lewis

[0] http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg03724.html
[1] https://issues.apache.org/jira/browse/MAPREDUCE-3520

-- 
Lewis

Re: How to Access a Mapper and Reducer Counters (cont'd???)

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Hi,

So I was working on this again. To display all counters from a
currently running job I've been doing

Collection<String> mapTaskCounterGroups =
currentJob.getCounters().getGroupNames();

When I print these values to std out this gives me the following;
[FileSystemCounters, org.apache.hadoop.mapred.Task$Counter,
org.apache.hadoop.mapreduce.lib.input.FileInputFormat$Counter,
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat$Counter]

So when I do

int mapFileSystemCounter = currentJob.getCounters().getGroup
        ("$aboveGroups").size();

and print the values of each to std out I am getting

[FileSystemCounters, org.apache.hadoop.mapred.Task$Counter,
org.apache.hadoop.mapreduce.lib.input.FileInputFormat$Counter,
org.apache.hadoop.mapreduce.lib.output.FileOutputFormat$Counter]
FileSystem Counters: 2
MapRed Counters: 8
Input Counters: 0
Output Counters: 1
Total Counters: 12

This is not making sense to me...

Can anyone please help me understand where the missing counter is in
the map job?

Thanks in advance for any help on this one, it is greatly appreciated.

Lewis



On Tue, Oct 30, 2012 at 10:36 PM, Lewis John Mcgibbney
<le...@gmail.com> wrote:
> Hi All,
>
> I am attempting to access the number of output records in a map task
> and therefore turned to the o.a.h.mapreduce.Job#getCounters() API and
> attempted to do something like this
>
> String mapTaskOutputCounterName = MAP_OUTPUT_RECORDS;
> ...
> Counters counter = currentJob.getCounters();
> ...
> counter.getGroup(mapTaskOutputCounterName).size();
>
> However this always gave me 0 when I attempted to push this value to
> LOG output. I therefore searched on the list archives and came across
> this rather interesting thread [0] which eventually leads to
> MAPREDUCE-3520[1] highlighting the need for a new interface for
> metrics to be exchanged between maps and reduces.
>
> I need to be honest here and say that the integer value I am after is
> rather trivial in its purpose (it compliments some simple logging
> within Nutch 2.x) however it would be great if someone could provide
> me with the code to obtain the correct counter group (e.g.
> MAP_OUTPUT_RECORDS) from within the Job Counters.
>
> Thank you very much in advance for any help which comes this way.
>
> Lewis
>
> [0] http://www.mail-archive.com/mapreduce-user@hadoop.apache.org/msg03724.html
> [1] https://issues.apache.org/jira/browse/MAPREDUCE-3520
>
> --
> Lewis



-- 
Lewis