You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jimmy Lin <ji...@umd.edu> on 2008/02/08 15:25:53 UTC
Question about key sorting interaction effects
Hi guys,
I was wondering if someone could explain the possible interaction
effects between the different methods available to control key sorting.
Based on my understanding, there are three separate knobs:
- a WritableComparable's compareTo method
- registering a WritableComparator optimization
- setOutputKeyComparatorClass method in JobConf
So here's my questions: what happens if these each define a different
sort order?
To be more concrete, in a recent application I inadvertently defined an
output key comparator that defined an ordering that was different from
the WritableComparable's natural ordering (as defined by its compareTo).
Running the application on small data sets lead to (my) expected
behavior, sort order as defined by the output key comparator. However,
I got unanticipated results with larger data sets, which leads me to
suspect that different methods are used to sort at different times...
Thanks in advance for the response!
-Jimmy
RE: Question about key sorting interaction effects
Posted by Joydeep Sen Sarma <js...@facebook.com>.
it would help if u could paste ur output key comparator. it should override the other things listed in ur mail - but the thing to watch out is that one has to define both the compare(byte [] ...) and the compare(WritableComparable ...) functions. i am wondering whether one was omitted .. (and inherited from WritableComparator?)
-----Original Message-----
From: Jimmy Lin [mailto:jimmylin@umd.edu]
Sent: Fri 2/8/2008 6:25 AM
To: core-user@hadoop.apache.org
Subject: Question about key sorting interaction effects
Hi guys,
I was wondering if someone could explain the possible interaction
effects between the different methods available to control key sorting.
Based on my understanding, there are three separate knobs:
- a WritableComparable's compareTo method
- registering a WritableComparator optimization
- setOutputKeyComparatorClass method in JobConf
So here's my questions: what happens if these each define a different
sort order?
To be more concrete, in a recent application I inadvertently defined an
output key comparator that defined an ordering that was different from
the WritableComparable's natural ordering (as defined by its compareTo).
Running the application on small data sets lead to (my) expected
behavior, sort order as defined by the output key comparator. However,
I got unanticipated results with larger data sets, which leads me to
suspect that different methods are used to sort at different times...
Thanks in advance for the response!
-Jimmy