You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/02/28 22:23:40 UTC

no complete sort

Hi all, 
here is a wired observation. The keys in the result of *ONE* reducer are ordered like this:
18166   
18169    
1817    
18171    
18172    

why is key "1817" comes after "18169"? It makes sense if that key is "18170" but it isn't! Why does it happen and basically, how does hadoop tell key1 is larger than key2? Compare their hash codes?

Thanks.
-Gang

Re: no complete sort

Posted by Prateek Jindal <ji...@illinois.edu>.

Hi Gang, It is sorting it lexicographically.

--Prateek.

On Sun, Feb 28, 2010 at 3:23 PM, Gang Luo <lg...@yahoo.com.cn> wrote:

> Hi all,
> here is a wired observation. The keys in the result of *ONE* reducer are
> ordered like this:
> 18166
> 18169
> 1817
> 18171
> 18172
>
> why is key "1817" comes after "18169"? It makes sense if that key is
> "18170" but it isn't! Why does it happen and basically, how does hadoop tell
> key1 is larger than key2? Compare their hash codes?
>
> Thanks.
> -Gang
>
>
>
>
>


-- 
Prateek Jindal
Ph.D. student,
Dept. of Computer Science,
UIUC.

Re: no complete sort

Posted by Gang Luo <lg...@yahoo.com.cn>.

Thanks Ed and Prateek who indicate this in previous mail. Yes, I use Text instead of IntWritable. It make sense if it is sorted in lexicographical order.

-Gang

----- 原始邮件 ----
发件人： Ed Mazur <ma...@cs.umass.edu>
收件人： common-user@hadoop.apache.org
发送日期： 2010/2/28 (周日) 4:28:46 下午
主   题： Re: no complete sort

Hi Gang,

What's your reduce output key type? It looks like you're using Text
instead of IntWritable, causing your keys to be sorted
lexicographically instead of numerically.

Sorting is done with a comparator that defines how an arbitrary
element compares to another. Hashing serves a different purpose.

Ed

On Sun, Feb 28, 2010 at 4:23 PM, Gang Luo <lg...@yahoo.com.cn> wrote:
> Hi all,
> here is a wired observation. The keys in the result of *ONE* reducer are ordered like this:
> 18166
> 18169
> 1817
> 18171
> 18172
>
> why is key "1817" comes after "18169"? It makes sense if that key is "18170" but it isn't! Why does it happen and basically, how does hadoop tell key1 is larger than key2? Compare their hash codes?
>
> Thanks.
> -Gang
>
>
>
>
>

Re: no complete sort

Posted by Ed Mazur <ma...@cs.umass.edu>.

Hi Gang,

What's your reduce output key type? It looks like you're using Text
instead of IntWritable, causing your keys to be sorted
lexicographically instead of numerically.

Sorting is done with a comparator that defines how an arbitrary
element compares to another. Hashing serves a different purpose.

Ed

On Sun, Feb 28, 2010 at 4:23 PM, Gang Luo <lg...@yahoo.com.cn> wrote:
> Hi all,
> here is a wired observation. The keys in the result of *ONE* reducer are ordered like this:
> 18166
> 18169
> 1817
> 18171
> 18172
>
> why is key "1817" comes after "18169"? It makes sense if that key is "18170" but it isn't! Why does it happen and basically, how does hadoop tell key1 is larger than key2? Compare their hash codes?
>
> Thanks.
> -Gang
>
>
>
>
>