You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jimmy Lin <ji...@umd.edu> on 2008/02/08 15:25:53 UTC

Question about key sorting interaction effects

Hi guys,

I was wondering if someone could explain the possible interaction 
effects between the different methods available to control key sorting. 
  Based on my understanding, there are three separate knobs:

- a WritableComparable's compareTo method
- registering a WritableComparator optimization
- setOutputKeyComparatorClass method in JobConf

So here's my questions: what happens if these each define a different 
sort order?

To be more concrete, in a recent application I inadvertently defined an 
output key comparator that defined an ordering that was different from 
the WritableComparable's natural ordering (as defined by its compareTo). 
  Running the application on small data sets lead to (my) expected 
behavior, sort order as defined by the output key comparator.  However, 
I got unanticipated results with larger data sets, which leads me to 
suspect that different methods are used to sort at different times...

Thanks in advance for the response!

-Jimmy

RE: Question about key sorting interaction effects

Posted by Joydeep Sen Sarma <js...@facebook.com>.
it would help if u could paste ur output key comparator. it should override the other things listed in ur mail - but the thing to watch out is that one has to define both the compare(byte [] ...) and the compare(WritableComparable ...) functions. i am wondering whether one was omitted .. (and inherited from WritableComparator?)


-----Original Message-----
From: Jimmy Lin [mailto:jimmylin@umd.edu]
Sent: Fri 2/8/2008 6:25 AM
To: core-user@hadoop.apache.org
Subject: Question about key sorting interaction effects
 
Hi guys,

I was wondering if someone could explain the possible interaction 
effects between the different methods available to control key sorting. 
  Based on my understanding, there are three separate knobs:

- a WritableComparable's compareTo method
- registering a WritableComparator optimization
- setOutputKeyComparatorClass method in JobConf

So here's my questions: what happens if these each define a different 
sort order?

To be more concrete, in a recent application I inadvertently defined an 
output key comparator that defined an ordering that was different from 
the WritableComparable's natural ordering (as defined by its compareTo). 
  Running the application on small data sets lead to (my) expected 
behavior, sort order as defined by the output key comparator.  However, 
I got unanticipated results with larger data sets, which leads me to 
suspect that different methods are used to sort at different times...

Thanks in advance for the response!

-Jimmy