You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by "laozhao0@sina.cn" <la...@sina.cn> on 2014/08/17 04:35:34 UTC
How to sort in a WordCount
Hello,I am using mapreduce to get the frequency of words in a corpus .�And I want to get a descnding sorted result.Now �I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
回复: Re:How to sort in a WordCount
Posted by "laozhao0@sina.cn" <la...@sina.cn>.
Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.
laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.?
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
回复: Re:How to sort in a WordCount
Posted by "laozhao0@sina.cn" <la...@sina.cn>.
Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.
laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.?
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
回复: Re:How to sort in a WordCount
Posted by "laozhao0@sina.cn" <la...@sina.cn>.
Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.
laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.?
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
回复: Re:How to sort in a WordCount
Posted by "laozhao0@sina.cn" <la...@sina.cn>.
Hi,?周杰Thank you for your reply.partition can get the top-k of the wordcount result ?In this problem,I have a lot of search keywords , and i want to know the top-k words .So for those words which only occures one or two times, i can ignore them.
laozhao0@sina.cn
?发件人:?周杰发送时间:?2014-08-17?10:54收件人:?user主题:?Re:How to sort in a WordCountHi,laozhao
???? are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.?
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,I am using mapreduce to get the frequency of words in a corpus .?And I want to get a descnding sorted result.Now ?I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
Re:How to sort in a WordCount
Posted by 周杰 <zh...@126.com>.
Hi,laozhao
are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,
I am using mapreduce to get the frequency of words in a corpus .
And I want to get a descnding sorted result.
Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
Re: How to sort in a WordCount
Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and you don’t have to worry about a more complex partitioner.
Kai
Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:
> Hello,
> I am using mapreduce to get the frequency of words in a corpus .
> And I want to get a descnding sorted result.
> Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
>
> Thakns.
>
> laozhao0@sina.cn
Kai Voigt Am Germaniahafen 1 k@123.org
24143 Kiel +49 160 96683050
Germany @KaiVoigt
Re: How to sort in a WordCount
Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and you don’t have to worry about a more complex partitioner.
Kai
Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:
> Hello,
> I am using mapreduce to get the frequency of words in a corpus .
> And I want to get a descnding sorted result.
> Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
>
> Thakns.
>
> laozhao0@sina.cn
Kai Voigt Am Germaniahafen 1 k@123.org
24143 Kiel +49 160 96683050
Germany @KaiVoigt
Re:How to sort in a WordCount
Posted by 周杰 <zh...@126.com>.
Hi,laozhao
are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,
I am using mapreduce to get the frequency of words in a corpus .
And I want to get a descnding sorted result.
Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
Re:How to sort in a WordCount
Posted by 周杰 <zh...@126.com>.
Hi,laozhao
are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,
I am using mapreduce to get the frequency of words in a corpus .
And I want to get a descnding sorted result.
Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
Re: How to sort in a WordCount
Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and you don’t have to worry about a more complex partitioner.
Kai
Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:
> Hello,
> I am using mapreduce to get the frequency of words in a corpus .
> And I want to get a descnding sorted result.
> Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
>
> Thakns.
>
> laozhao0@sina.cn
Kai Voigt Am Germaniahafen 1 k@123.org
24143 Kiel +49 160 96683050
Germany @KaiVoigt
Re:How to sort in a WordCount
Posted by 周杰 <zh...@126.com>.
Hi,laozhao
are you gonna sort all the keys?Maybe you can sort all the keys by implentmenting partition function, something like range-partition.for example, there are 100 digits. you can design 5 partitions [1-20],[20,40],[40,60],[60,80],[80,100], deviding the different digits into different partitions.
At 2014-08-17 10:35:34, "laozhao0@sina.cn" <la...@sina.cn> wrote:
Hello,
I am using mapreduce to get the frequency of words in a corpus .
And I want to get a descnding sorted result.
Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
Thakns.
laozhao0@sina.cn
Re: How to sort in a WordCount
Posted by Kai Voigt <k...@123.org>.
You need a second MapReduce job. Take your WordCount input, have the mapper swapping keys and values, i.e. map(word, count) => (count, word), then your reducer will get the records sorted by count. Since you won’t have too many unique words, one reducer should be fine, and you don’t have to worry about a more complex partitioner.
Kai
Am 17.08.2014 um 04:35 schrieb laozhao0@sina.cn:
> Hello,
> I am using mapreduce to get the frequency of words in a corpus .
> And I want to get a descnding sorted result.
> Now I can use Hive to sort them ,but If i use mapreduce,how can i do this ?
>
> Thakns.
>
> laozhao0@sina.cn
Kai Voigt Am Germaniahafen 1 k@123.org
24143 Kiel +49 160 96683050
Germany @KaiVoigt