You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Trevor Adams <tr...@gmail.com> on 2011/06/29 18:59:52 UTC

Reduce method called same key twice

So I have a custom Key which is used for a join. It contains two fields, a
boolean (is primary key) and an int (key). Hashcode only looks at the key
field, so that it gets sent to the same reducer. Compare places the pkey at
the top of the list (if sorted using compare). This works nicely, except
that the reduce method is called with Key: 1 -> a single value, Key: 1 ->
another value etc. One for each value, so instead of bucketing the values to
a key (and some of the keys are identical, in every way) it sends 1 key and
1 value to the reducer at a time. How do I get it to bucket or why isn't it
bucketing?

-Trevor

Re: Reduce method called same key twice

Posted by Trevor Adams <tr...@gmail.com>.

So after I created the RawComparator for the key it worked as expected.
Thanks.

-Trevor

On Wed, Jun 29, 2011 at 2:47 PM, Aaron Baff <Aa...@telescope.tv> wrote:

> I dunno, I just know that when I use a separate comparator for my custom
> key (does something similar to yours, although 2 or 3 additional secondary
> fields to group on) it works as it should.
>
> --Aaron
>
>
> -----------------------------------------------------------------------------
> From: Trevor Adams [mailto:trevoradams@gmail.com]
> Sent: Wednesday, June 29, 2011 11:34 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Reduce method called same key twice
>
> So, that kind of makes sense but why would it not group the other values
> then? There are a bunch of the exact same key (only 1 primary record, so
> only 1 that is different per set) and it is my understanding that they would
> be grouped together (without the primary key) if I didn't do anything
> different.
>
> -Trevor
> On Wed, Jun 29, 2011 at 2:07 PM, Aaron Baff <Aa...@telescope.tv>
> wrote:
> You probably need to implement a custom comparator that you use as the
> grouping comparator that compares the primary key, and then if they are the
> same compares the int part of the key.
>
> --Aaron
>
>
> -----------------------------------------------------------------------------
> From: Trevor Adams [mailto:trevoradams@gmail.com]
> Sent: Wednesday, June 29, 2011 10:00 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Reduce method called same key twice
>
> So I have a custom Key which is used for a join. It contains two fields, a
> boolean (is primary key) and an int (key). Hashcode only looks at the key
> field, so that it gets sent to the same reducer. Compare places the pkey at
> the top of the list (if sorted using compare). This works nicely, except
> that the reduce method is called with Key: 1 -> a single value, Key: 1 ->
> another value etc. One for each value, so instead of bucketing the values to
> a key (and some of the keys are identical, in every way) it sends 1 key and
> 1 value to the reducer at a time. How do I get it to bucket or why isn't it
> bucketing?
>
> -Trevor
>
>

RE: Reduce method called same key twice

Posted by Aaron Baff <Aa...@telescope.tv>.

I dunno, I just know that when I use a separate comparator for my custom key (does something similar to yours, although 2 or 3 additional secondary fields to group on) it works as it should.

--Aaron

-----------------------------------------------------------------------------
From: Trevor Adams [mailto:trevoradams@gmail.com]
Sent: Wednesday, June 29, 2011 11:34 AM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Reduce method called same key twice

So, that kind of makes sense but why would it not group the other values then? There are a bunch of the exact same key (only 1 primary record, so only 1 that is different per set) and it is my understanding that they would be grouped together (without the primary key) if I didn't do anything different.

-Trevor
On Wed, Jun 29, 2011 at 2:07 PM, Aaron Baff <Aa...@telescope.tv> wrote:
You probably need to implement a custom comparator that you use as the grouping comparator that compares the primary key, and then if they are the same compares the int part of the key.

--Aaron

-----------------------------------------------------------------------------
From: Trevor Adams [mailto:trevoradams@gmail.com]
Sent: Wednesday, June 29, 2011 10:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: Reduce method called same key twice

So I have a custom Key which is used for a join. It contains two fields, a boolean (is primary key) and an int (key). Hashcode only looks at the key field, so that it gets sent to the same reducer. Compare places the pkey at the top of the list (if sorted using compare). This works nicely, except that the reduce method is called with Key: 1 -> a single value, Key: 1 -> another value etc. One for each value, so instead of bucketing the values to a key (and some of the keys are identical, in every way) it sends 1 key and 1 value to the reducer at a time. How do I get it to bucket or why isn't it bucketing?

-Trevor

Re: Reduce method called same key twice

Posted by Trevor Adams <tr...@gmail.com>.

So, that kind of makes sense but why would it not group the other values
then? There are a bunch of the exact same key (only 1 primary record, so
only 1 that is different per set) and it is my understanding that they would
be grouped together (without the primary key) if I didn't do anything
different.

-Trevor

On Wed, Jun 29, 2011 at 2:07 PM, Aaron Baff <Aa...@telescope.tv> wrote:

> You probably need to implement a custom comparator that you use as the
> grouping comparator that compares the primary key, and then if they are the
> same compares the int part of the key.
>
> --Aaron
>
>
> -----------------------------------------------------------------------------
> From: Trevor Adams [mailto:trevoradams@gmail.com]
> Sent: Wednesday, June 29, 2011 10:00 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: Reduce method called same key twice
>
> So I have a custom Key which is used for a join. It contains two fields, a
> boolean (is primary key) and an int (key). Hashcode only looks at the key
> field, so that it gets sent to the same reducer. Compare places the pkey at
> the top of the list (if sorted using compare). This works nicely, except
> that the reduce method is called with Key: 1 -> a single value, Key: 1 ->
> another value etc. One for each value, so instead of bucketing the values to
> a key (and some of the keys are identical, in every way) it sends 1 key and
> 1 value to the reducer at a time. How do I get it to bucket or why isn't it
> bucketing?
>
> -Trevor
>

RE: Reduce method called same key twice

Posted by Aaron Baff <Aa...@telescope.tv>.

You probably need to implement a custom comparator that you use as the grouping comparator that compares the primary key, and then if they are the same compares the int part of the key.

--Aaron

-----------------------------------------------------------------------------
From: Trevor Adams [mailto:trevoradams@gmail.com]
Sent: Wednesday, June 29, 2011 10:00 AM
To: mapreduce-user@hadoop.apache.org
Subject: Reduce method called same key twice

So I have a custom Key which is used for a join. It contains two fields, a boolean (is primary key) and an int (key). Hashcode only looks at the key field, so that it gets sent to the same reducer. Compare places the pkey at the top of the list (if sorted using compare). This works nicely, except that the reduce method is called with Key: 1 -> a single value, Key: 1 -> another value etc. One for each value, so instead of bucketing the values to a key (and some of the keys are identical, in every way) it sends 1 key and 1 value to the reducer at a time. How do I get it to bucket or why isn't it bucketing?

-Trevor