You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "laozhao0@sina.cn" <la...@sina.cn> on 2014/08/17 04:35:34 UTC

How to sort in a WordCount






Hello,I am using mapreduce to get the frequency of words in a corpus .�And I want to get a descnding sorted result.Now �I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.


laozhao0@sina.cn


回复: Re:How to sort in a WordCount

Posted by "laozhao0@sina.cn" <la...@sina.cn>.





Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.


laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.? 

At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
 
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.


laozhao0@sina.cn




回复: Re:How to sort in a WordCount

Posted by "laozhao0@sina.cn" <la...@sina.cn>.





Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.


laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.? 

At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
 
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.


laozhao0@sina.cn




回复: Re:How to sort in a WordCount

Posted by "laozhao0@sina.cn" <la...@sina.cn>.





Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.


laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.? 

At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
 
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.


laozhao0@sina.cn




回复: Re:How to sort in a WordCount

Posted by "laozhao0@sina.cn" <la...@sina.cn>.





Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.


laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.? 

At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
 
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.


laozhao0@sina.cn




Re:How to sort in a WordCount

Posted by 周杰 <zh...@126.com>.
Hi,laozhao
     are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions. 


At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:

Hello,
I am using mapreduce to get the frequency of words in a corpus . 
And I want to get a descnding sorted result.
Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?


Thakns.


laozhao0@sina.cn

Re: How to sort in a WordCount

Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and  you don’t have to worry about a more complex partitioner.

Kai

Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:

> Hello,
> I am using mapreduce to get the frequency of words in a corpus . 
> And I want to get a descnding sorted result.
> Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
> 
> Thakns.
> 
> laozhao0@sina.cn

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel					+49 160 96683050
					Germany						@KaiVoigt


Re: How to sort in a WordCount

Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and  you don’t have to worry about a more complex partitioner.

Kai

Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:

> Hello,
> I am using mapreduce to get the frequency of words in a corpus . 
> And I want to get a descnding sorted result.
> Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
> 
> Thakns.
> 
> laozhao0@sina.cn

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel					+49 160 96683050
					Germany						@KaiVoigt


Re:How to sort in a WordCount

Posted by 周杰 <zh...@126.com>.
Hi,laozhao
     are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions. 


At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:

Hello,
I am using mapreduce to get the frequency of words in a corpus . 
And I want to get a descnding sorted result.
Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?


Thakns.


laozhao0@sina.cn

Re:How to sort in a WordCount

Posted by 周杰 <zh...@126.com>.
Hi,laozhao
     are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions. 


At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:

Hello,
I am using mapreduce to get the frequency of words in a corpus . 
And I want to get a descnding sorted result.
Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?


Thakns.


laozhao0@sina.cn

Re: How to sort in a WordCount

Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and  you don’t have to worry about a more complex partitioner.

Kai

Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:

> Hello,
> I am using mapreduce to get the frequency of words in a corpus . 
> And I want to get a descnding sorted result.
> Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
> 
> Thakns.
> 
> laozhao0@sina.cn

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel					+49 160 96683050
					Germany						@KaiVoigt


Re:How to sort in a WordCount

Posted by 周杰 <zh...@126.com>.
Hi,laozhao
     are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions. 


At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:

Hello,
I am using mapreduce to get the frequency of words in a corpus . 
And I want to get a descnding sorted result.
Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?


Thakns.


laozhao0@sina.cn

Re: How to sort in a WordCount

Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and  you don’t have to worry about a more complex partitioner.

Kai

Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:

> Hello,
> I am using mapreduce to get the frequency of words in a corpus . 
> And I want to get a descnding sorted result.
> Now  I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
> 
> Thakns.
> 
> laozhao0@sina.cn

Kai Voigt			Am Germaniahafen 1			k@123.org
					24143 Kiel					+49 160 96683050
					Germany						@KaiVoigt