You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Jeremy Lewi <je...@lewi.us> on 2012/10/03 17:17:00 UTC

Counters that track the max value

HI hadoop-users,

I'm curious if there is an implementation somewhere of a counter which
tracks the maximum of some value across all mappers or reducers?

Thanks
J

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

Done.
https://issues.apache.org/jira/browse/MAPREDUCE-4709

Thanks
J

On Fri, Oct 5, 2012 at 10:13 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> I suppose thats doable, please file a MAPREDUCE JIRA so you can
> discuss this with others on the development side as well.
>
> I am guessing that MAX operations of most of the user-oriented data
> flow front-ends such as Hive and Pig already do this efficiently, so
> perhaps there hasn't been a very strong need for this.
>
> On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI Harsh,
> >
> > Thank you very much that will work.
> >
> > How come we can't simply create a modification of a regular mapreduce
> > counter which does this behind the scenes? It seems like we should just
> be
> > able to replace "+" with "max" and everything else should work?
> >
> > J
> >
> >
> > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Jeremy,
> >>
> >> Here's my shot at it (pardon the quick crappy code):
> >> https://gist.github.com/3828246
> >>
> >> Basically - you can achieve it in two ways:
> >>
> >> Requirement:  All tasks must increment the "max" designated counter
> >> only AFTER the max has been computed (i.e. in cleanup).
> >>
> >> 1. All tasks may use same counter name. Later, we pull per-task
> >> counters and determine the max at the client. (This is my quick and
> >> dirty implementation)
> >> 2. All tasks may use their own task ID (Number part) in the counter
> >> name, but use the same group. Later, we fetch all counters for that
> >> group and iterate over it to find the max. This is cleaner, and
> >> doesn't end up using deprecated APIs such as the above.
> >>
> >> Does this help?
> >>
> >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> >> > HI hadoop-users,
> >> >
> >> > I'm curious if there is an implementation somewhere of a counter which
> >> > tracks the maximum of some value across all mappers or reducers?
> >> >
> >> > Thanks
> >> > J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

Done.
https://issues.apache.org/jira/browse/MAPREDUCE-4709

Thanks
J

On Fri, Oct 5, 2012 at 10:13 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> I suppose thats doable, please file a MAPREDUCE JIRA so you can
> discuss this with others on the development side as well.
>
> I am guessing that MAX operations of most of the user-oriented data
> flow front-ends such as Hive and Pig already do this efficiently, so
> perhaps there hasn't been a very strong need for this.
>
> On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI Harsh,
> >
> > Thank you very much that will work.
> >
> > How come we can't simply create a modification of a regular mapreduce
> > counter which does this behind the scenes? It seems like we should just
> be
> > able to replace "+" with "max" and everything else should work?
> >
> > J
> >
> >
> > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Jeremy,
> >>
> >> Here's my shot at it (pardon the quick crappy code):
> >> https://gist.github.com/3828246
> >>
> >> Basically - you can achieve it in two ways:
> >>
> >> Requirement:  All tasks must increment the "max" designated counter
> >> only AFTER the max has been computed (i.e. in cleanup).
> >>
> >> 1. All tasks may use same counter name. Later, we pull per-task
> >> counters and determine the max at the client. (This is my quick and
> >> dirty implementation)
> >> 2. All tasks may use their own task ID (Number part) in the counter
> >> name, but use the same group. Later, we fetch all counters for that
> >> group and iterate over it to find the max. This is cleaner, and
> >> doesn't end up using deprecated APIs such as the above.
> >>
> >> Does this help?
> >>
> >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> >> > HI hadoop-users,
> >> >
> >> > I'm curious if there is an implementation somewhere of a counter which
> >> > tracks the maximum of some value across all mappers or reducers?
> >> >
> >> > Thanks
> >> > J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

Done.
https://issues.apache.org/jira/browse/MAPREDUCE-4709

Thanks
J

On Fri, Oct 5, 2012 at 10:13 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> I suppose thats doable, please file a MAPREDUCE JIRA so you can
> discuss this with others on the development side as well.
>
> I am guessing that MAX operations of most of the user-oriented data
> flow front-ends such as Hive and Pig already do this efficiently, so
> perhaps there hasn't been a very strong need for this.
>
> On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI Harsh,
> >
> > Thank you very much that will work.
> >
> > How come we can't simply create a modification of a regular mapreduce
> > counter which does this behind the scenes? It seems like we should just
> be
> > able to replace "+" with "max" and everything else should work?
> >
> > J
> >
> >
> > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Jeremy,
> >>
> >> Here's my shot at it (pardon the quick crappy code):
> >> https://gist.github.com/3828246
> >>
> >> Basically - you can achieve it in two ways:
> >>
> >> Requirement:  All tasks must increment the "max" designated counter
> >> only AFTER the max has been computed (i.e. in cleanup).
> >>
> >> 1. All tasks may use same counter name. Later, we pull per-task
> >> counters and determine the max at the client. (This is my quick and
> >> dirty implementation)
> >> 2. All tasks may use their own task ID (Number part) in the counter
> >> name, but use the same group. Later, we fetch all counters for that
> >> group and iterate over it to find the max. This is cleaner, and
> >> doesn't end up using deprecated APIs such as the above.
> >>
> >> Does this help?
> >>
> >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> >> > HI hadoop-users,
> >> >
> >> > I'm curious if there is an implementation somewhere of a counter which
> >> > tracks the maximum of some value across all mappers or reducers?
> >> >
> >> > Thanks
> >> > J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

Done.
https://issues.apache.org/jira/browse/MAPREDUCE-4709

Thanks
J

On Fri, Oct 5, 2012 at 10:13 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> I suppose thats doable, please file a MAPREDUCE JIRA so you can
> discuss this with others on the development side as well.
>
> I am guessing that MAX operations of most of the user-oriented data
> flow front-ends such as Hive and Pig already do this efficiently, so
> perhaps there hasn't been a very strong need for this.
>
> On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI Harsh,
> >
> > Thank you very much that will work.
> >
> > How come we can't simply create a modification of a regular mapreduce
> > counter which does this behind the scenes? It seems like we should just
> be
> > able to replace "+" with "max" and everything else should work?
> >
> > J
> >
> >
> > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
> >>
> >> Jeremy,
> >>
> >> Here's my shot at it (pardon the quick crappy code):
> >> https://gist.github.com/3828246
> >>
> >> Basically - you can achieve it in two ways:
> >>
> >> Requirement:  All tasks must increment the "max" designated counter
> >> only AFTER the max has been computed (i.e. in cleanup).
> >>
> >> 1. All tasks may use same counter name. Later, we pull per-task
> >> counters and determine the max at the client. (This is my quick and
> >> dirty implementation)
> >> 2. All tasks may use their own task ID (Number part) in the counter
> >> name, but use the same group. Later, we fetch all counters for that
> >> group and iterate over it to find the max. This is cleaner, and
> >> doesn't end up using deprecated APIs such as the above.
> >>
> >> Does this help?
> >>
> >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> >> > HI hadoop-users,
> >> >
> >> > I'm curious if there is an implementation somewhere of a counter which
> >> > tracks the maximum of some value across all mappers or reducers?
> >> >
> >> > Thanks
> >> > J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

I suppose thats doable, please file a MAPREDUCE JIRA so you can
discuss this with others on the development side as well.

I am guessing that MAX operations of most of the user-oriented data
flow front-ends such as Hive and Pig already do this efficiently, so
perhaps there hasn't been a very strong need for this.

On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI Harsh,
>
> Thank you very much that will work.
>
> How come we can't simply create a modification of a regular mapreduce
> counter which does this behind the scenes? It seems like we should just be
> able to replace "+" with "max" and everything else should work?
>
> J
>
>
> On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Jeremy,
>>
>> Here's my shot at it (pardon the quick crappy code):
>> https://gist.github.com/3828246
>>
>> Basically - you can achieve it in two ways:
>>
>> Requirement:  All tasks must increment the "max" designated counter
>> only AFTER the max has been computed (i.e. in cleanup).
>>
>> 1. All tasks may use same counter name. Later, we pull per-task
>> counters and determine the max at the client. (This is my quick and
>> dirty implementation)
>> 2. All tasks may use their own task ID (Number part) in the counter
>> name, but use the same group. Later, we fetch all counters for that
>> group and iterate over it to find the max. This is cleaner, and
>> doesn't end up using deprecated APIs such as the above.
>>
>> Does this help?
>>
>> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
>> > HI hadoop-users,
>> >
>> > I'm curious if there is an implementation somewhere of a counter which
>> > tracks the maximum of some value across all mappers or reducers?
>> >
>> > Thanks
>> > J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

I suppose thats doable, please file a MAPREDUCE JIRA so you can
discuss this with others on the development side as well.

I am guessing that MAX operations of most of the user-oriented data
flow front-ends such as Hive and Pig already do this efficiently, so
perhaps there hasn't been a very strong need for this.

On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI Harsh,
>
> Thank you very much that will work.
>
> How come we can't simply create a modification of a regular mapreduce
> counter which does this behind the scenes? It seems like we should just be
> able to replace "+" with "max" and everything else should work?
>
> J
>
>
> On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Jeremy,
>>
>> Here's my shot at it (pardon the quick crappy code):
>> https://gist.github.com/3828246
>>
>> Basically - you can achieve it in two ways:
>>
>> Requirement:  All tasks must increment the "max" designated counter
>> only AFTER the max has been computed (i.e. in cleanup).
>>
>> 1. All tasks may use same counter name. Later, we pull per-task
>> counters and determine the max at the client. (This is my quick and
>> dirty implementation)
>> 2. All tasks may use their own task ID (Number part) in the counter
>> name, but use the same group. Later, we fetch all counters for that
>> group and iterate over it to find the max. This is cleaner, and
>> doesn't end up using deprecated APIs such as the above.
>>
>> Does this help?
>>
>> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
>> > HI hadoop-users,
>> >
>> > I'm curious if there is an implementation somewhere of a counter which
>> > tracks the maximum of some value across all mappers or reducers?
>> >
>> > Thanks
>> > J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

I suppose thats doable, please file a MAPREDUCE JIRA so you can
discuss this with others on the development side as well.

I am guessing that MAX operations of most of the user-oriented data
flow front-ends such as Hive and Pig already do this efficiently, so
perhaps there hasn't been a very strong need for this.

On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI Harsh,
>
> Thank you very much that will work.
>
> How come we can't simply create a modification of a regular mapreduce
> counter which does this behind the scenes? It seems like we should just be
> able to replace "+" with "max" and everything else should work?
>
> J
>
>
> On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Jeremy,
>>
>> Here's my shot at it (pardon the quick crappy code):
>> https://gist.github.com/3828246
>>
>> Basically - you can achieve it in two ways:
>>
>> Requirement:  All tasks must increment the "max" designated counter
>> only AFTER the max has been computed (i.e. in cleanup).
>>
>> 1. All tasks may use same counter name. Later, we pull per-task
>> counters and determine the max at the client. (This is my quick and
>> dirty implementation)
>> 2. All tasks may use their own task ID (Number part) in the counter
>> name, but use the same group. Later, we fetch all counters for that
>> group and iterate over it to find the max. This is cleaner, and
>> doesn't end up using deprecated APIs such as the above.
>>
>> Does this help?
>>
>> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
>> > HI hadoop-users,
>> >
>> > I'm curious if there is an implementation somewhere of a counter which
>> > tracks the maximum of some value across all mappers or reducers?
>> >
>> > Thanks
>> > J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

I suppose thats doable, please file a MAPREDUCE JIRA so you can
discuss this with others on the development side as well.

I am guessing that MAX operations of most of the user-oriented data
flow front-ends such as Hive and Pig already do this efficiently, so
perhaps there hasn't been a very strong need for this.

On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI Harsh,
>
> Thank you very much that will work.
>
> How come we can't simply create a modification of a regular mapreduce
> counter which does this behind the scenes? It seems like we should just be
> able to replace "+" with "max" and everything else should work?
>
> J
>
>
> On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:
>>
>> Jeremy,
>>
>> Here's my shot at it (pardon the quick crappy code):
>> https://gist.github.com/3828246
>>
>> Basically - you can achieve it in two ways:
>>
>> Requirement:  All tasks must increment the "max" designated counter
>> only AFTER the max has been computed (i.e. in cleanup).
>>
>> 1. All tasks may use same counter name. Later, we pull per-task
>> counters and determine the max at the client. (This is my quick and
>> dirty implementation)
>> 2. All tasks may use their own task ID (Number part) in the counter
>> name, but use the same group. Later, we fetch all counters for that
>> group and iterate over it to find the max. This is cleaner, and
>> doesn't end up using deprecated APIs such as the above.
>>
>> Does this help?
>>
>> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
>> > HI hadoop-users,
>> >
>> > I'm curious if there is an implementation somewhere of a counter which
>> > tracks the maximum of some value across all mappers or reducers?
>> >
>> > Thanks
>> > J
>>
>>
>>
>> --
>> Harsh J
>
>



-- 
Harsh J

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

HI Harsh,

Thank you very much that will work.

How come we can't simply create a modification of a regular mapreduce
counter which does this behind the scenes? It seems like we should just be
able to replace "+" with "max" and everything else should work?

J

On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> Here's my shot at it (pardon the quick crappy code):
> https://gist.github.com/3828246
>
> Basically - you can achieve it in two ways:
>
> Requirement:  All tasks must increment the "max" designated counter
> only AFTER the max has been computed (i.e. in cleanup).
>
> 1. All tasks may use same counter name. Later, we pull per-task
> counters and determine the max at the client. (This is my quick and
> dirty implementation)
> 2. All tasks may use their own task ID (Number part) in the counter
> name, but use the same group. Later, we fetch all counters for that
> group and iterate over it to find the max. This is cleaner, and
> doesn't end up using deprecated APIs such as the above.
>
> Does this help?
>
> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI hadoop-users,
> >
> > I'm curious if there is an implementation somewhere of a counter which
> > tracks the maximum of some value across all mappers or reducers?
> >
> > Thanks
> > J
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

HI Harsh,

Thank you very much that will work.

How come we can't simply create a modification of a regular mapreduce
counter which does this behind the scenes? It seems like we should just be
able to replace "+" with "max" and everything else should work?

J

On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> Here's my shot at it (pardon the quick crappy code):
> https://gist.github.com/3828246
>
> Basically - you can achieve it in two ways:
>
> Requirement:  All tasks must increment the "max" designated counter
> only AFTER the max has been computed (i.e. in cleanup).
>
> 1. All tasks may use same counter name. Later, we pull per-task
> counters and determine the max at the client. (This is my quick and
> dirty implementation)
> 2. All tasks may use their own task ID (Number part) in the counter
> name, but use the same group. Later, we fetch all counters for that
> group and iterate over it to find the max. This is cleaner, and
> doesn't end up using deprecated APIs such as the above.
>
> Does this help?
>
> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI hadoop-users,
> >
> > I'm curious if there is an implementation somewhere of a counter which
> > tracks the maximum of some value across all mappers or reducers?
> >
> > Thanks
> > J
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

HI Harsh,

Thank you very much that will work.

How come we can't simply create a modification of a regular mapreduce
counter which does this behind the scenes? It seems like we should just be
able to replace "+" with "max" and everything else should work?

J

On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> Here's my shot at it (pardon the quick crappy code):
> https://gist.github.com/3828246
>
> Basically - you can achieve it in two ways:
>
> Requirement:  All tasks must increment the "max" designated counter
> only AFTER the max has been computed (i.e. in cleanup).
>
> 1. All tasks may use same counter name. Later, we pull per-task
> counters and determine the max at the client. (This is my quick and
> dirty implementation)
> 2. All tasks may use their own task ID (Number part) in the counter
> name, but use the same group. Later, we fetch all counters for that
> group and iterate over it to find the max. This is cleaner, and
> doesn't end up using deprecated APIs such as the above.
>
> Does this help?
>
> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI hadoop-users,
> >
> > I'm curious if there is an implementation somewhere of a counter which
> > tracks the maximum of some value across all mappers or reducers?
> >
> > Thanks
> > J
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Jeremy Lewi <je...@lewi.us>.

HI Harsh,

Thank you very much that will work.

How come we can't simply create a modification of a regular mapreduce
counter which does this behind the scenes? It seems like we should just be
able to replace "+" with "max" and everything else should work?

J

On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <ha...@cloudera.com> wrote:

> Jeremy,
>
> Here's my shot at it (pardon the quick crappy code):
> https://gist.github.com/3828246
>
> Basically - you can achieve it in two ways:
>
> Requirement:  All tasks must increment the "max" designated counter
> only AFTER the max has been computed (i.e. in cleanup).
>
> 1. All tasks may use same counter name. Later, we pull per-task
> counters and determine the max at the client. (This is my quick and
> dirty implementation)
> 2. All tasks may use their own task ID (Number part) in the counter
> name, but use the same group. Later, we fetch all counters for that
> group and iterate over it to find the max. This is cleaner, and
> doesn't end up using deprecated APIs such as the above.
>
> Does this help?
>
> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> > HI hadoop-users,
> >
> > I'm curious if there is an implementation somewhere of a counter which
> > tracks the maximum of some value across all mappers or reducers?
> >
> > Thanks
> > J
>
>
>
> --
> Harsh J
>

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

Here's my shot at it (pardon the quick crappy code):
https://gist.github.com/3828246

Basically - you can achieve it in two ways:

Requirement:  All tasks must increment the "max" designated counter
only AFTER the max has been computed (i.e. in cleanup).

1. All tasks may use same counter name. Later, we pull per-task
counters and determine the max at the client. (This is my quick and
dirty implementation)
2. All tasks may use their own task ID (Number part) in the counter
name, but use the same group. Later, we fetch all counters for that
group and iterate over it to find the max. This is cleaner, and
doesn't end up using deprecated APIs such as the above.

Does this help?

On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI hadoop-users,
>
> I'm curious if there is an implementation somewhere of a counter which
> tracks the maximum of some value across all mappers or reducers?
>
> Thanks
> J

-- 
Harsh J

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

Here's my shot at it (pardon the quick crappy code):
https://gist.github.com/3828246

Basically - you can achieve it in two ways:

Requirement:  All tasks must increment the "max" designated counter
only AFTER the max has been computed (i.e. in cleanup).

1. All tasks may use same counter name. Later, we pull per-task
counters and determine the max at the client. (This is my quick and
dirty implementation)
2. All tasks may use their own task ID (Number part) in the counter
name, but use the same group. Later, we fetch all counters for that
group and iterate over it to find the max. This is cleaner, and
doesn't end up using deprecated APIs such as the above.

Does this help?

On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI hadoop-users,
>
> I'm curious if there is an implementation somewhere of a counter which
> tracks the maximum of some value across all mappers or reducers?
>
> Thanks
> J

-- 
Harsh J

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

Here's my shot at it (pardon the quick crappy code):
https://gist.github.com/3828246

Basically - you can achieve it in two ways:

Requirement:  All tasks must increment the "max" designated counter
only AFTER the max has been computed (i.e. in cleanup).

1. All tasks may use same counter name. Later, we pull per-task
counters and determine the max at the client. (This is my quick and
dirty implementation)
2. All tasks may use their own task ID (Number part) in the counter
name, but use the same group. Later, we fetch all counters for that
group and iterate over it to find the max. This is cleaner, and
doesn't end up using deprecated APIs such as the above.

Does this help?

On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI hadoop-users,
>
> I'm curious if there is an implementation somewhere of a counter which
> tracks the maximum of some value across all mappers or reducers?
>
> Thanks
> J

-- 
Harsh J

Re: Counters that track the max value

Posted by Harsh J <ha...@cloudera.com>.

Jeremy,

Here's my shot at it (pardon the quick crappy code):
https://gist.github.com/3828246

Basically - you can achieve it in two ways:

Requirement:  All tasks must increment the "max" designated counter
only AFTER the max has been computed (i.e. in cleanup).

1. All tasks may use same counter name. Later, we pull per-task
counters and determine the max at the client. (This is my quick and
dirty implementation)
2. All tasks may use their own task ID (Number part) in the counter
name, but use the same group. Later, we fetch all counters for that
group and iterate over it to find the max. This is cleaner, and
doesn't end up using deprecated APIs such as the above.

Does this help?

On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <je...@lewi.us> wrote:
> HI hadoop-users,
>
> I'm curious if there is an implementation somewhere of a counter which
> tracks the maximum of some value across all mappers or reducers?
>
> Thanks
> J

-- 
Harsh J