You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Chih-Hsien Wu <ch...@gmail.com> on 2013/11/22 22:44:47 UTC

Canopy threshold limitation

Just out of curiosity. Is there a threshold limitation for canopy
algorithm? Is it just defined by the user's preference based on the
inter-cluster distances? or perhaps it is just limited by how much memory
allowed to execute them?

Re: Canopy threshold limitation

Posted by Chih-Hsien Wu <ch...@gmail.com>.
Hey Suneel, thanks for the reply. I'm trying to create hierarchical
clusters via top down approach. I'm caught in the trade off between the
lower canopy threshold and running out of heap memory.  Stream Kmeans
sounds ideal for top clustering. What are the major differences between
Streaming kmeans verses Kmeans, other than faster and less memory usage? In
other words, what are the pros and cons?


On Fri, Nov 22, 2013 at 5:30 PM, Suneel Marthi <su...@yahoo.com>wrote:

> the threshold is based on user's pref of inter-cluster distances. If you
> are running out of memory, suggest increasing the JVM memory settings.
>
> Not sure as to what you are trying to accomplish, but if you are looking
> to get a first cut at clustering; suggest u look at the new Streaming
> kmeans that's part of Mahout 0.8.
>
> See
> http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-meansfor the steps.
>
>
>
>
>
>
> On Friday, November 22, 2013 4:45 PM, Chih-Hsien Wu <ch...@gmail.com>
> wrote:
>
> Just out of curiosity. Is there a threshold limitation for canopy
> algorithm? Is it just defined by the user's preference based on the
> inter-cluster distances? or perhaps it is just limited by how much memory
> allowed to execute them?
>

Re: Canopy threshold limitation

Posted by Suneel Marthi <su...@yahoo.com>.
the threshold is based on user's pref of inter-cluster distances. If you are running out of memory, suggest increasing the JVM memory settings.

Not sure as to what you are trying to accomplish, but if you are looking to get a first cut at clustering; suggest u look at the new Streaming kmeans that's part of Mahout 0.8.

See http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-means for the steps.






On Friday, November 22, 2013 4:45 PM, Chih-Hsien Wu <ch...@gmail.com> wrote:
 
Just out of curiosity. Is there a threshold limitation for canopy
algorithm? Is it just defined by the user's preference based on the
inter-cluster distances? or perhaps it is just limited by how much memory
allowed to execute them?