You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Andrzej Bialecki <ab...@getopt.org> on 2007/01/02 13:37:03 UTC
Per-job counters
Hi,
I'm trying to figure out how to implement per-job counters. Google's
paper on map-reduce mentions that their API allows individual tasks to
update global counters, defined for each job, and then easily retrieve
them when the job is completed.
Example: process some records in a map-reduce job (with many map and
reduce taks), and at the end of the job emit the total count of
processed records for the whole job (or any other programmer-defined
count aggregated during processing).
I was looking at the metrics API, but it's not obvious to me if it's
useful in this case ... if so, how should I go about it?
I could probably implement extended OutputFormat-s that write down these
counters per each task to a separate output file, and then read them at
the end of the job, but this seems awfully intrusive and complex for
such a simple functionality...
I'd appreciate any suggestions.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Per-job counters
Posted by Andrzej Bialecki <ab...@getopt.org>.
Arkady Borkovsky wrote:
> see also
> http://issues.apache.org/jira/browse/HADOOP-492
Indeed, that's exactly the same issue I'm facing. And I can't see how
the current metrics API would help.
I'll add my vote to this issue ...
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Re: Per-job counters
Posted by Arkady Borkovsky <ar...@yahoo-inc.com>.
see also
http://issues.apache.org/jira/browse/HADOOP-492
On Jan 2, 2007, at 8:57 AM, Runping Qi wrote:
>
> I used to declare various counters in my map/reduce classes to keep
> track
> various statistics of my jobs. Typically, those counters are
> initialized by
> the configure method, updated by the map/reduce methods, and finalized
> by
> the close method. However, those counters are per task basis, I don't
> have a
> good way to aggregate those counters across an entire job. It would be
> nice
> to introduce some API to let each map/reduce task initialize their own
> counters and register them with the job. At the end of the job, the job
> tracker can automatically aggregate them and made them available
> through API
> or through the job status.
>
> Runping
>
>
>> -----Original Message-----
>> From: Andrzej Bialecki [mailto:ab@getopt.org]
>> Sent: Tuesday, January 02, 2007 4:37 AM
>> To: hadoop-dev@lucene.apache.org
>> Subject: Per-job counters
>>
>> Hi,
>>
>> I'm trying to figure out how to implement per-job counters. Google's
>> paper on map-reduce mentions that their API allows individual tasks to
>> update global counters, defined for each job, and then easily retrieve
>> them when the job is completed.
>>
>> Example: process some records in a map-reduce job (with many map and
>> reduce taks), and at the end of the job emit the total count of
>> processed records for the whole job (or any other programmer-defined
>> count aggregated during processing).
>>
>> I was looking at the metrics API, but it's not obvious to me if it's
>> useful in this case ... if so, how should I go about it?
>>
>> I could probably implement extended OutputFormat-s that write down
>> these
>> counters per each task to a separate output file, and then read them
>> at
>> the end of the job, but this seems awfully intrusive and complex for
>> such a simple functionality...
>>
>> I'd appreciate any suggestions.
>>
>> --
>> Best regards,
>> Andrzej Bialecki <><
>> ___. ___ ___ ___ _ _ __________________________________
>> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
>> ___|||__|| \| || | Embedded Unix, System Integration
>> http://www.sigram.com Contact: info at sigram dot com
>>
>
>
RE: Per-job counters
Posted by Runping Qi <ru...@yahoo-inc.com>.
I used to declare various counters in my map/reduce classes to keep track
various statistics of my jobs. Typically, those counters are initialized by
the configure method, updated by the map/reduce methods, and finalized by
the close method. However, those counters are per task basis, I don't have a
good way to aggregate those counters across an entire job. It would be nice
to introduce some API to let each map/reduce task initialize their own
counters and register them with the job. At the end of the job, the job
tracker can automatically aggregate them and made them available through API
or through the job status.
Runping
> -----Original Message-----
> From: Andrzej Bialecki [mailto:ab@getopt.org]
> Sent: Tuesday, January 02, 2007 4:37 AM
> To: hadoop-dev@lucene.apache.org
> Subject: Per-job counters
>
> Hi,
>
> I'm trying to figure out how to implement per-job counters. Google's
> paper on map-reduce mentions that their API allows individual tasks to
> update global counters, defined for each job, and then easily retrieve
> them when the job is completed.
>
> Example: process some records in a map-reduce job (with many map and
> reduce taks), and at the end of the job emit the total count of
> processed records for the whole job (or any other programmer-defined
> count aggregated during processing).
>
> I was looking at the metrics API, but it's not obvious to me if it's
> useful in this case ... if so, how should I go about it?
>
> I could probably implement extended OutputFormat-s that write down these
> counters per each task to a separate output file, and then read them at
> the end of the job, but this seems awfully intrusive and complex for
> such a simple functionality...
>
> I'd appreciate any suggestions.
>
> --
> Best regards,
> Andrzej Bialecki <><
> ___. ___ ___ ___ _ _ __________________________________
> [__ || __|__/|__||\/| Information Retrieval, Semantic Web
> ___|||__|| \| || | Embedded Unix, System Integration
> http://www.sigram.com Contact: info at sigram dot com
>