You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by Andrzej Bialecki <ab...@getopt.org> on 2007/01/02 13:37:03 UTC

Per-job counters

Hi,

I'm trying to figure out how to implement per-job counters. Google's 
paper on map-reduce mentions that their API allows individual tasks to 
update global counters, defined for each job, and then easily retrieve 
them when the job is completed.

Example: process some records in a map-reduce job (with many map and 
reduce taks), and at the end of the job emit the total count of 
processed records for the whole job (or any other programmer-defined 
count aggregated during processing).

I was looking at the metrics API, but it's not obvious to me if it's 
useful in this case ... if so, how should I go about it?

I could probably implement extended OutputFormat-s that write down these 
counters per each task to a separate output file, and then read them at 
the end of the job, but this seems awfully intrusive and complex for 
such a simple functionality...

I'd appreciate any suggestions.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Per-job counters

Posted by Andrzej Bialecki <ab...@getopt.org>.

Arkady Borkovsky wrote:
> see also
> http://issues.apache.org/jira/browse/HADOOP-492


Indeed, that's exactly the same issue I'm facing. And I can't see how 
the current metrics API would help.

I'll add my vote to this issue ...

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Re: Per-job counters

Posted by Arkady Borkovsky <ar...@yahoo-inc.com>.

see also
http://issues.apache.org/jira/browse/HADOOP-492

On Jan 2, 2007, at 8:57 AM, Runping Qi wrote:

>
> I used to declare various counters in my map/reduce classes to keep 
> track
> various statistics of my jobs. Typically, those counters are 
> initialized by
> the configure method, updated by the map/reduce methods, and finalized 
> by
> the close method. However, those counters are per task basis, I don't 
> have a
> good way to aggregate those counters across an entire job. It would be 
> nice
> to introduce some API to let each map/reduce task initialize their own
> counters and register them with the job. At the end of the job, the job
> tracker can automatically aggregate them and made them available 
> through API
> or through the job status.
>
> Runping
>
>
>> -----Original Message-----
>> From: Andrzej Bialecki [mailto:ab@getopt.org]
>> Sent: Tuesday, January 02, 2007 4:37 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Per-job counters
>>
>> Hi,
>>
>> I'm trying to figure out how to implement per-job counters. Google's
>> paper on map-reduce mentions that their API allows individual tasks to
>> update global counters, defined for each job, and then easily retrieve
>> them when the job is completed.
>>
>> Example: process some records in a map-reduce job (with many map and
>> reduce taks), and at the end of the job emit the total count of
>> processed records for the whole job (or any other programmer-defined
>> count aggregated during processing).
>>
>> I was looking at the metrics API, but it's not obvious to me if it's
>> useful in this case ... if so, how should I go about it?
>>
>> I could probably implement extended OutputFormat-s that write down 
>> these
>> counters per each task to a separate output file, and then read them 
>> at
>> the end of the job, but this seems awfully intrusive and complex for
>> such a simple functionality...
>>
>> I'd appreciate any suggestions.
>>
>> --
>> Best regards,
>> Andrzej Bialecki     <><
>>  ___. ___ ___ ___ _ _   __________________________________
>> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
>> ___|||__||  \|  ||  |  Embedded Unix, System Integration
>> http://www.sigram.com  Contact: info at sigram dot com
>>
>
>

RE: Per-job counters

Posted by Runping Qi <ru...@yahoo-inc.com>.

I used to declare various counters in my map/reduce classes to keep track
various statistics of my jobs. Typically, those counters are initialized by
the configure method, updated by the map/reduce methods, and finalized by
the close method. However, those counters are per task basis, I don't have a
good way to aggregate those counters across an entire job. It would be nice
to introduce some API to let each map/reduce task initialize their own
counters and register them with the job. At the end of the job, the job
tracker can automatically aggregate them and made them available through API
or through the job status.

Runping


> -----Original Message-----
> From: Andrzej Bialecki [mailto:ab@getopt.org]
> Sent: Tuesday, January 02, 2007 4:37 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Per-job counters
> 
> Hi,
> 
> I'm trying to figure out how to implement per-job counters. Google's
> paper on map-reduce mentions that their API allows individual tasks to
> update global counters, defined for each job, and then easily retrieve
> them when the job is completed.
> 
> Example: process some records in a map-reduce job (with many map and
> reduce taks), and at the end of the job emit the total count of
> processed records for the whole job (or any other programmer-defined
> count aggregated during processing).
> 
> I was looking at the metrics API, but it's not obvious to me if it's
> useful in this case ... if so, how should I go about it?
> 
> I could probably implement extended OutputFormat-s that write down these
> counters per each task to a separate output file, and then read them at
> the end of the job, but this seems awfully intrusive and complex for
> such a simple functionality...
> 
> I'd appreciate any suggestions.
> 
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>