You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/02/28 22:23:40 UTC
no complete sort
Hi all,
here is a wired observation. The keys in the result of *ONE* reducer are ordered like this:
18166
18169
1817
18171
18172
why is key "1817" comes after "18169"? It makes sense if that key is "18170" but it isn't! Why does it happen and basically, how does hadoop tell key1 is larger than key2? Compare their hash codes?
Thanks.
-Gang
Re: no complete sort
Posted by Prateek Jindal <ji...@illinois.edu>.
Hi Gang, It is sorting it lexicographically.
--Prateek.
On Sun, Feb 28, 2010 at 3:23 PM, Gang Luo <lg...@yahoo.com.cn> wrote:
> Hi all,
> here is a wired observation. The keys in the result of *ONE* reducer are
> ordered like this:
> 18166
> 18169
> 1817
> 18171
> 18172
>
> why is key "1817" comes after "18169"? It makes sense if that key is
> "18170" but it isn't! Why does it happen and basically, how does hadoop tell
> key1 is larger than key2? Compare their hash codes?
>
> Thanks.
> -Gang
>
>
>
>
>
--
Prateek Jindal
Ph.D. student,
Dept. of Computer Science,
UIUC.
Re: no complete sort
Posted by Gang Luo <lg...@yahoo.com.cn>.
Thanks Ed and Prateek who indicate this in previous mail. Yes, I use Text instead of IntWritable. It make sense if it is sorted in lexicographical order.
-Gang
----- 原始邮件 ----
发件人: Ed Mazur <ma...@cs.umass.edu>
收件人: common-user@hadoop.apache.org
发送日期: 2010/2/28 (周日) 4:28:46 下午
主 题: Re: no complete sort
Hi Gang,
What's your reduce output key type? It looks like you're using Text
instead of IntWritable, causing your keys to be sorted
lexicographically instead of numerically.
Sorting is done with a comparator that defines how an arbitrary
element compares to another. Hashing serves a different purpose.
Ed
On Sun, Feb 28, 2010 at 4:23 PM, Gang Luo <lg...@yahoo.com.cn> wrote:
> Hi all,
> here is a wired observation. The keys in the result of *ONE* reducer are ordered like this:
> 18166
> 18169
> 1817
> 18171
> 18172
>
> why is key "1817" comes after "18169"? It makes sense if that key is "18170" but it isn't! Why does it happen and basically, how does hadoop tell key1 is larger than key2? Compare their hash codes?
>
> Thanks.
> -Gang
>
>
>
>
>
Re: no complete sort
Posted by Ed Mazur <ma...@cs.umass.edu>.
Hi Gang,
What's your reduce output key type? It looks like you're using Text
instead of IntWritable, causing your keys to be sorted
lexicographically instead of numerically.
Sorting is done with a comparator that defines how an arbitrary
element compares to another. Hashing serves a different purpose.
Ed
On Sun, Feb 28, 2010 at 4:23 PM, Gang Luo <lg...@yahoo.com.cn> wrote:
> Hi all,
> here is a wired observation. The keys in the result of *ONE* reducer are ordered like this:
> 18166
> 18169
> 1817
> 18171
> 18172
>
> why is key "1817" comes after "18169"? It makes sense if that key is "18170" but it isn't! Why does it happen and basically, how does hadoop tell key1 is larger than key2? Compare their hash codes?
>
> Thanks.
> -Gang
>
>
>
>
>