You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Marko Dinić <ma...@nissatech.com> on 2014/10/02 11:45:54 UTC

Re: Streaming K Means

Suneel,

I thank you again for your answer.

I'm trying to implement some kind of cluster based anomaly detection. 
For that, I need to cluster normal examples, and then, when a new 
example gets into system, I need to assign it to nearest centroid (by 
calculating the distance between existing centroids and the new 
example), and then I need the distances from the points in that cluster 
to the centroid.

I could use K Means for that, but I'm hopping to get better results 
using Streaming K Means, primarily because of its KMeans++ 
initialization (which I could probably implement myself, but I'm trying 
to avoid that, since it is already implemented), and also I understand 
that it can be faster than usual Streaming K Means, since it does one 
pass clustering, before the Ball K Means step. Please correct me if you 
disagree with the things I said.

Maybe I'm doing something wrong, but I'm getting only one file as 
output - part-r-00000, while I'm expecting something like - 
ClusteredPoints and Clusters-*-final, in case of KMeans? How can I get 
and read in centroids and clustered points?

Also, I see this qualcluster in the examples/bin/cluster-reuters.sh 
that you have provided, what is it used for?

Thanks,
Marko

On понедељак, 29. септембар 2014. 20:00:33 CEST, Suneel Marthi wrote:
> This was replied to earlier with the details u r looking for, repeating
> here again:
>
>
> See
> http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-means/18090471#18090471
> for how to invoke Streaming Kmeans
>
> Also look at examples/bin/cluster-reuters.sh for the Streaming KMeans
> option.
>
>
> If all that u r looking for his centroids and distances from centroids,
> wouldn't KMeans suffice?  It would help if u could provide more details as
> to what u r trying to accomplish here?
>
>
> On Mon, Sep 29, 2014 at 9:55 AM, Marko <ma...@nissatech.com> wrote:
>
>> Hello everyone,
>>
>> I have previously asked a question about Streaming K Means examples, and
>> got an answer that there are not so many available.
>>
>> Can anyone give me example of how to call Streaming K Means clustering for
>> a dataset, and how to get the results?
>>
>> What are the results, are they the same as in basic K Means? Do I get
>> centroids and clustered points? And do I get the distance between point and
>> its centroid, like in K Means?
>>
>> I would like to run Streaming K Means clustering on a dataset, and read in
>> centroids, and also I need the distance from the points to their given
>> centroids. How to do that?
>>
>> Thanks
>>
>

--
Pozdrav,
Marko Dinić