You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Marshall Schor <ms...@schor.com> on 2009/05/01 04:25:57 UTC

Implementing compareTo in user-written keys where one extends the other is error prone

Hi.  I had difficulties in getting Reduce sorting to wor - it took me a good art
of a day to figure out what was going wrong, so I'm sharing this in hopes of
earning something from the community or getting hadoop improved to avoid thisind
of error for future users.

I have 2 key classes, one holds a String, the other one extends that, and adds a
boolean.

I implemented the first key class (let's call it Super)

public class Super implements WritableComparable<Super> {
 . . .
  public int compareTo(Super o) {
    // sort on string value
    . . .
  }

I implemented the 2nd key class (let's call it Sub)

public class Sub extends Super {
 . . .
  public int compareTo(Sub o) {
    // sort on boolean value
    . . .
    // if equal, use the super:
    ... else
     return super.compareTo(o);
  }


With this setup, I used the "Sub" class as a mapper output key, and
expected the sort on the boolean value to happen first, then for equal
values there, the sort on the string values.

What actually happened, was that the sort on the boolean value was
skipped completely, and only the sort on the string was done.

The reason for this is that (in 0.19.1 release) the WritableCompator
instance that is created (using the defaults - no custom Comparator)
knows the class is "Sub", and calls from the key value it created, and
calls the compareTo method, passing it the other key.  Both of these
keys are of type Sub.  However, they are passed via this code in
WritableComparator:

 public int compare(WritableComparable a, WritableComparable b) {
    return a.compareTo(b);
  }

Java uses the interface spec for WritableComparable that was declared,
in this case WritableComparable<Super>, and infers that the arg type for
the compareTo is Super.  So it "skips" calling the compareTo in Sub, and
just calls the one in Super.

The workaround is to change the signature of Sub's compareTo method to
match the spec in the interface, namely it has to take the Super as an
argument, and then cast it to Sub.

This seems like a very error prone design.  Am I doing something wrong,
or can this be improved so that this kind of error is avoided?

-Marshall Schor


Re: Implementing compareTo in user-written keys where one extends the other is error prone

Posted by Marshall Schor <ms...@schor.com>.
thanks for the tip.  I'll look into it - it doesn't look too hard in my
case to do.  -Marshall

Owen O'Malley wrote:
> If you use custom key types, you really should be defining a
> RawComparator. It will perform much much better.
>
> -- Owen
>
>

Re: Implementing compareTo in user-written keys where one extends the other is error prone

Posted by Owen O'Malley <om...@apache.org>.
If you use custom key types, you really should be defining a  
RawComparator. It will perform much much better.

-- Owen