You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Zhige Xin <xi...@gmail.com> on 2014/08/09 18:49:44 UTC
Top K words problem
I have a question about hadoop that how to modify the wordcount program to
give the top K words according to their occurrences.
The naive method is to count and sort but it needs too many lines of code
and is not elegant. Another one uses a data structure, called TreeMap, to
solve this problem, which only takes 100 lines and reduces the time
complexity.
Are there any other ways? Any ideas are welcomed.
Best,
Isaiah
Re: Top K words problem
Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.
Best regards, Jens
Re: Top K words problem
Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.
Best regards, Jens
Re: Top K words problem
Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.
Best regards, Jens
Re: Top K words problem
Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.
Best regards, Jens