You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stan Rosenberg <sr...@proclivitysystems.com> on 2011/08/15 03:20:14 UTC

WritableComparable

Hi Folks,

After much poking around I am still unable to determine why I am seeing
'reduce' being called twice with the "same" key.
Recall from my previous email that "sameness" is determined by 'compareTo'
of my custom key type.

AFAIK, the default WritableComparator invokes 'compareTo' for any two keys
which are being ordered during sorting and merging.
Is it somehow possible that a bitwise comparator is used for the spilled map
output rather than the default WritableComparator?

I am out of clues, short of studying the "shuffling" code.  If anyone can
suggest some further debugging steps, don't be shy. :)

Thanks!!!

stan

Re: WritableComparable

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
Thanks for all the help on this issue.  It turned out to be a very simple
problem with my 'compareTo' implementation.
The ordering was symmetric but _not_ transitive.

stan

On Tue, Aug 16, 2011 at 4:47 PM, Chris White <ch...@gmail.com>wrote:

> Can you copy the contents of your parent Writable readField and write
> methods (not the ones youve already posted)
>
> Another thing you could try is if you know you have two identical keys, can
> you write a unit test to examine the result of compareTo for two instances
> to confirm the correct behavior (even going as far as serializing and
> deserializing before the comparison)
>
> Finally just to confirm, you dont have any group or order comparators
> registered?
>

Re: WritableComparable

Posted by Chris White <ch...@gmail.com>.
Can you copy the contents of your parent Writable readField and write
methods (not the ones youve already posted)

Another thing you could try is if you know you have two identical keys, can
you write a unit test to examine the result of compareTo for two instances
to confirm the correct behavior (even going as far as serializing and
deserializing before the comparison)

Finally just to confirm, you dont have any group or order comparators
registered?

Re: WritableComparable

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
On Tue, Aug 16, 2011 at 6:14 AM, Chris White <ch...@gmail.com>wrote:

> Are you using a hash partioner? If so make sure the hash value of the
> writable is not calculated using the hashCode value of the enum - use the
> ordinal value instead. The hashcode value of an enum is different for each
> jvm.
>

Thanks for the tip. I am using a hash partitioner (its the default) but my
hash value is not based
on an enum value.  In any case, the keys in question get hashed to the same
reducer.

Best,

stan

Re: WritableComparable

Posted by Chris White <ch...@gmail.com>.
Are you using a hash partioner? If so make sure the hash value of the
writable is not calculated using the hashCode value of the enum - use the
ordinal value instead. The hashcode value of an enum is different for each
jvm.

Re: WritableComparable

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
On Sun, Aug 14, 2011 at 10:25 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> What are the types of key1 and key2? What does the readFields() method
> look like?


The type of key1 is essentially a wrapper for java.util.UUID.
Here is its readFields:

public void readFields(DataInput in) throws IOException {
  id = new UUID(in.readLong(), in.readLong());
}

So, it reconstitutes the UUID by deserializing two longs.  The 'compareTo'
method of this key type delegates to java.util.UUID.compareTo.

The type of key2 wraps a different id, one that fits into a long.  In
addition to an id, it also stores an enum which designates the "source" of
this id.
Here is its readFields:

public void readFields(DataInput in) throws IOException {
  source = Source.values()[in.readByte() & 0xFF];
  id = in.readLong();
}

The source is an enum value which is serialized by writing its ordinal.
 (There are only two possible enum values, hence only one byte.)
The 'compareTo' method of this key type orders by the source values if the
id values are different, otherwise by the id values.

Re: WritableComparable

Posted by Joey Echeverria <jo...@cloudera.com>.
What are the types of key1 and key2? What does the readFields() method
look like?

-Joey

On Sun, Aug 14, 2011 at 10:07 PM, Stan Rosenberg
<sr...@proclivitysystems.com> wrote:
> On Sun, Aug 14, 2011 at 9:33 PM, Joey Echeverria <jo...@cloudera.com> wrote:
>
>> Does your compareTo() method test object pointer equality? If so, you could
>> be getting burned by Hadoop reusing Writable objects.
>
>
> Yes, but only the equality between enum values.  Interestingly, when
> 'reduce' is called there are three instances of the "same" key.
> Two instances are correctly merged and they both come from the same mapper.
>  The other instance comes from a different mapper, and for
> some reason does not get merged.  I see the key and the values
> (corresponding to the two merged instances) passed as arguments
> to 'reduce'; then in subsequent 'reduce' call I see the key and the value
> corresponding to the third instance.
>
> For completeness, here is my 'Key.compareTo':
>
> public int compareTo(Key o) {
>   if (this.type != o.type) {
>       // Type.X < Type.Y
>       return (this.type == Type.X ? -1 : 1);
>   }
>   // otherwise, delegate
>   if (this.type == Type.X) {
>      return this.key1.compareTo(o.key1);
>   } else {
>      return this.key2.compareTo(o.key2);
>   }
> }
>
> The 'type' field is an enum with two possible values, say X and Y.  Key is
> essentially a union type; i.e., at any given time
> it's the values in key1 or key2 that are being compared (depending on the
> 'type' value).
>



-- 
Joseph Echeverria
Cloudera, Inc.
443.305.9434

Re: WritableComparable

Posted by Stan Rosenberg <sr...@proclivitysystems.com>.
On Sun, Aug 14, 2011 at 9:33 PM, Joey Echeverria <jo...@cloudera.com> wrote:

> Does your compareTo() method test object pointer equality? If so, you could
> be getting burned by Hadoop reusing Writable objects.


Yes, but only the equality between enum values.  Interestingly, when
'reduce' is called there are three instances of the "same" key.
Two instances are correctly merged and they both come from the same mapper.
 The other instance comes from a different mapper, and for
some reason does not get merged.  I see the key and the values
(corresponding to the two merged instances) passed as arguments
to 'reduce'; then in subsequent 'reduce' call I see the key and the value
corresponding to the third instance.

For completeness, here is my 'Key.compareTo':

public int compareTo(Key o) {
   if (this.type != o.type) {
       // Type.X < Type.Y
       return (this.type == Type.X ? -1 : 1);
   }
   // otherwise, delegate
   if (this.type == Type.X) {
      return this.key1.compareTo(o.key1);
   } else {
      return this.key2.compareTo(o.key2);
   }
}

The 'type' field is an enum with two possible values, say X and Y.  Key is
essentially a union type; i.e., at any given time
it's the values in key1 or key2 that are being compared (depending on the
'type' value).

Re: WritableComparable

Posted by Joey Echeverria <jo...@cloudera.com>.
Does your compareTo() method test object pointer equality? If so, you could
be getting burned by Hadoop reusing Writable objects.

-Joey
On Aug 14, 2011 9:20 PM, "Stan Rosenberg" <sr...@proclivitysystems.com>
wrote:
> Hi Folks,
>
> After much poking around I am still unable to determine why I am seeing
> 'reduce' being called twice with the "same" key.
> Recall from my previous email that "sameness" is determined by 'compareTo'
> of my custom key type.
>
> AFAIK, the default WritableComparator invokes 'compareTo' for any two keys
> which are being ordered during sorting and merging.
> Is it somehow possible that a bitwise comparator is used for the spilled
map
> output rather than the default WritableComparator?
>
> I am out of clues, short of studying the "shuffling" code. If anyone can
> suggest some further debugging steps, don't be shy. :)
>
> Thanks!!!
>
> stan