You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Savannah Beckett <sa...@yahoo.com> on 2010/12/26 06:05:52 UTC
How to do Secondary Sort on a String and a float?
I am writing a Secondary Sort to sort a String key and float value. I am
following the example in
mapred/src/examples/org/apache/hadoop/examples/SecondarySort.java in the hadoop
package. The example is for a pair of integers. I did lots of research online
but most of them were still using the old API. It seems that for the new API, I
have to implement the RawComparator interface which means I need to write the
compare byte function no matter what.
I have problem with this code:
public static class FirstGroupingComparator
implements RawComparator<IntPair> {
@Override
public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {
return WritableComparator.compareBytes(b1, s1, Integer.SIZE/8,
b2, s2, Integer.SIZE/8);
}
@Override
public int compare(IntPair o1, IntPair o2) {
int l = o1.getFirst();
int r = o2.getFirst();
return l == r ? 0 : (l < r ? -1 : 1);
}
}
How do I write the code inside the first compare function? What should I put as
the length of the String and float (primitive type) in the compareBytes
function? Does anyone have any examples for a pair of String and float?
Thanks. Merry Christmas.
Re: How to do Secondary Sort on a String and a float?
Posted by Harsh J <qw...@gmail.com>.
Hi,
You can use WritableComparator for "Writable" serializations. Docs
here: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/WritableComparator.html
The issue lies with how you're encoding your pair of <String, Float>.
If you know sizes defined for each (or have a marker byte between,
etc.), you can extract the bytes out of the required object alone
(String or Float) and use the compareBytes function on it. The "s1 &
s2" define start points, and "l1 and l2" define lengths to read from
"s1 & s2" points -- on the passed byte[] arrays for the two "Writable"
objects.
You can also, perhaps, de-serialize the whole byte stream (via your
Writable.readFields()) and then compare object-wise -- but this would
make it slow, since byte-to-byte comparisions are faster, hence
RawComparator.
Avro has a neat serialization, I prefer using it over plain Writables.
Working with a "Schema" is much more easier.
--
Harsh J
www.harshj.com