You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Gang Luo <lg...@yahoo.com.cn> on 2010/02/03 06:10:04 UTC

sort at reduce side

Hi all,
I want to know some more details about the sorting at the reduce side. 

The intermediate result generated at the map side is stored as map file which actually consists of two sub-files, namely index file and data file. The index file stores the keys and it could point to corresponding record stored in the data file.  What I think is that when intermediate result (even only part of it for each mapper) is shuffled to reducer, it is still kept in map file. If so, in order to efficiently sort the data, reducer actually only read the index part of each spill (which is a map file) and sort the keys, instead of reading whole records from disk and sort them. 

Does reducer actually do as what I expect?

-Gang


      ___________________________________________________________ 
  好玩贺卡等你发,邮箱贺卡全新上线! 
http://card.mail.cn.yahoo.com/

Re: sort at reduce side

Posted by Edward Capriolo <ed...@gmail.com>.
2010/2/3 Srigurunath Chakravarthi <sr...@yahoo-inc.com>:
> Hi Gang,
>
>>kept in map file. If so, in order to efficiently sort the data, reducer
>>actually only read the index part of each spill (which is a map file) and
>>sort the keys, instead of reading whole records from disk and sort them.
>
>  afaik, no. Reduces always fetches map output data and not indexes (even if the data is from the local node, where an index may be sufficient).
>
> Regards,
> Sriguru
>
>>-----Original Message-----
>>From: Gang Luo [mailto:lgpublic@yahoo.com.cn]
>>Sent: Wednesday, February 03, 2010 10:40 AM
>>To: common-user@hadoop.apache.org
>>Subject: sort at reduce side
>>
>>Hi all,
>>I want to know some more details about the sorting at the reduce side.
>>
>>The intermediate result generated at the map side is stored as map file
>>which actually consists of two sub-files, namely index file and data file.
>>The index file stores the keys and it could point to corresponding record
>>stored in the data file.  What I think is that when intermediate result
>>(even only part of it for each mapper) is shuffled to reducer, it is still
>>kept in map file. If so, in order to efficiently sort the data, reducer
>>actually only read the index part of each spill (which is a map file) and
>>sort the keys, instead of reading whole records from disk and sort them.
>>
>>Does reducer actually do as what I expect?
>>
>>-Gang
>>
>>
>>      ___________________________________________________________
>>  好玩贺卡等你发,邮箱贺卡全新上线!
>>http://card.mail.cn.yahoo.com/
>

With .20 and the TotalOrderPartioner isn't reduce side sorting
possible now? Is that support we can/should add to hive?

RE: sort at reduce side

Posted by Srigurunath Chakravarthi <sr...@yahoo-inc.com>.
Hi Gang,

>kept in map file. If so, in order to efficiently sort the data, reducer
>actually only read the index part of each spill (which is a map file) and
>sort the keys, instead of reading whole records from disk and sort them. 

 afaik, no. Reduces always fetches map output data and not indexes (even if the data is from the local node, where an index may be sufficient).

Regards,
Sriguru

>-----Original Message-----
>From: Gang Luo [mailto:lgpublic@yahoo.com.cn]
>Sent: Wednesday, February 03, 2010 10:40 AM
>To: common-user@hadoop.apache.org
>Subject: sort at reduce side
>
>Hi all,
>I want to know some more details about the sorting at the reduce side.
>
>The intermediate result generated at the map side is stored as map file
>which actually consists of two sub-files, namely index file and data file.
>The index file stores the keys and it could point to corresponding record
>stored in the data file.  What I think is that when intermediate result
>(even only part of it for each mapper) is shuffled to reducer, it is still
>kept in map file. If so, in order to efficiently sort the data, reducer
>actually only read the index part of each spill (which is a map file) and
>sort the keys, instead of reading whole records from disk and sort them.
>
>Does reducer actually do as what I expect?
>
>-Gang
>
>
>      ___________________________________________________________
>  好玩贺卡等你发,邮箱贺卡全新上线!
>http://card.mail.cn.yahoo.com/