You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by Zhige Xin <xi...@gmail.com> on 2014/08/09 18:49:44 UTC

Top K words problem

I have a question about hadoop that how to modify the wordcount program to
give the top K words according to their occurrences.

The naive method is to count and sort but it needs too many lines of code
and is not elegant. Another one uses a data structure, called TreeMap, to
solve this problem, which only takes 100 lines and reduces the time
complexity.

Are there any other ways? Any ideas are welcomed.




Best,
Isaiah

Re: Top K words problem

Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.

Best regards, Jens

Re: Top K words problem

Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.

Best regards, Jens

Re: Top K words problem

Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.

Best regards, Jens

Re: Top K words problem

Posted by Jens Scheidtmann <je...@gmail.com>.
Google for streaming algorithms also stream processing for getting ideas.

Best regards, Jens