You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Aaron Baff <Aa...@telescope.tv> on 2010/11/12 01:29:15 UTC

Problem with custom WritableComparable

I'm having a problem with a custom WritableComparable that I created to use as a Key object. I basically have a number of identifier's with a timestamp, and I'm wanting to group the Identifier's together in the reducer, and order the records by the timestamp (oldest to newest). When I used it as coded, I end up with a new reduce() call for every single record (even with the same Identifier), while when I comment out the timestamp comparison code (as below), it works perfectly and I only see 1 reduce() call per Identifier as I'm just returning the compareTo() of the 2 Identifiers. Have I made a wrong assumption somewhere about how it's supposed to work? Did I do something wrong?

--Aaron


public class IdentifierTimestampKey implements WritableComparable {
    private String identifier = "";
    private long timestamp = 0L;

    public void write(DataOutput out) throws IOException {
        out.writeUTF(identifier);
        out.writeLong(timestamp);
    }

    public void readFields(DataInput in) throws IOException {
        identifier = in.readUTF();
        timestamp = in.readLong();
    }

    @Override
    public int hashCode() {
        return identifier.hashCode();
    }

    @Override
    public boolean equals(Object obj) {
        if (obj == null) {
            return false;
        }
        if( obj instanceof IdentifierTimestampKey ) {
            final IdentifierTimestampKey other = (IdentifierTimestampKey) obj;
            if( this.identifier == null ) {
                return other.identifier == null;
            } else {
                return this.identifier.equals(other.identifier);
            }
        }
        return false;
    }

    public int compareTo(Object obj) {
        if (obj == null) {
            throw new ClassCastException("Object is NULL and so cannot be compared!");
        }
        int ret = 0;
        if( obj instanceof IdentifierTimestampKey ) {
            final IdentifierTimestampKey other = (IdentifierTimestampKey)obj;
//            if( this.identifier.equals(other.identifier) ) {
//                if( this.timestamp < other.timestamp ) {
//                    ret = -1;
//                } else if( this.timestamp > other.timestamp ) {
//                    ret = 1;
//                }
//            } else {
                ret = this.identifier.compareTo(other.identifier);
//            }
        } else {
            throw new ClassCastException("Object is of type " + obj.getClass().getName() + " which cannot be compared to this class of type " + getClass().getName());
        }

        return ret;
    }

    public String getIdentifier() {
        return identifier;
    }

    public void setIdentifier(String identifier) {
        this.identifier = identifier;
    }

    public long getTimestamp() {
        return timestamp;
    }

    public void setTimestamp(long timestamp) {
        this.timestamp = timestamp;
    }

    public static class Comparator extends WritableComparator {
        public Comparator() {
          super(IdentifierTimestampKey.class, true);
        }
    }
}

RE: Problem with custom WritableComparable

Posted by Aaron Baff <Aa...@telescope.tv>.

>On Thu, Nov 11, 2010 at 4:29 PM, Aaron Baff <Aa...@telescope.tv> wrote:
>
>> I'm having a problem with a custom WritableComparable that I created
>> to use as a Key object. I basically have a number of identifier's with
>> a timestamp, and I'm wanting to group the Identifier's together in the
>> reducer, and order the records by the timestamp (oldest to newest)
>
>
>The reduce is called for each distinct key. Fortunately, there is an option to get different grouping going into the reduce called the "grouping"
>comparator. Look at the SecondarySort example for how to do it. Also note that your partitioner needs to make sure that the partition is only picked based on the primary key. (This can be effected by making the hashcode only depend on it, if you use the HashPartitioner.
>
>-- Owen
>

Thanks Owen. Once I set the correct setOutputKeyComparatorClass() and setOutputValueGroupingComparator() based on the SecondarySort example and http://markmail.org/message/7gonm3kiasyh2xnf#query:setOutputKeyComparatorClass+page:3+mid:esn3lgzyx3ag26cy+state:results which I found through a helpful Google search, I got it to work. Didn't need to create a specific Partitioner as my hasCode() function was just using the part of the Key that I wanted to partition by (just the Identifier), so I just used the HashPartitioner.

--Aaron

Re: Problem with custom WritableComparable

Posted by Owen O'Malley <om...@apache.org>.

On Thu, Nov 11, 2010 at 4:29 PM, Aaron Baff <Aa...@telescope.tv> wrote:

> I'm having a problem with a custom WritableComparable that I created to use
> as a Key object. I basically have a number of identifier's with a timestamp,
> and I'm wanting to group the Identifier's together in the reducer, and order
> the records by the timestamp (oldest to newest)

The reduce is called for each distinct key. Fortunately, there is an option
to get different grouping going into the reduce called the "grouping"
comparator. Look at the SecondarySort example for how to do it. Also note
that your partitioner needs to make sure that the partition is only picked
based on the primary key. (This can be effected by making the hashcode only
depend on it, if you use the HashPartitioner.

-- Owen