You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Marko Dinić <ma...@nissatech.com> on 2014/10/02 11:45:54 UTC
Re: Streaming K Means
Suneel,
I thank you again for your answer.
I'm trying to implement some kind of cluster based anomaly detection.
For that, I need to cluster normal examples, and then, when a new
example gets into system, I need to assign it to nearest centroid (by
calculating the distance between existing centroids and the new
example), and then I need the distances from the points in that cluster
to the centroid.
I could use K Means for that, but I'm hopping to get better results
using Streaming K Means, primarily because of its KMeans++
initialization (which I could probably implement myself, but I'm trying
to avoid that, since it is already implemented), and also I understand
that it can be faster than usual Streaming K Means, since it does one
pass clustering, before the Ball K Means step. Please correct me if you
disagree with the things I said.
Maybe I'm doing something wrong, but I'm getting only one file as
output - part-r-00000, while I'm expecting something like -
ClusteredPoints and Clusters-*-final, in case of KMeans? How can I get
and read in centroids and clustered points?
Also, I see this qualcluster in the examples/bin/cluster-reuters.sh
that you have provided, what is it used for?
Thanks,
Marko
On понедељак, 29. септембар 2014. 20:00:33 CEST, Suneel Marthi wrote:
> This was replied to earlier with the details u r looking for, repeating
> here again:
>
>
> See
> http://stackoverflow.com/questions/17272296/how-to-use-mahout-streaming-k-means/18090471#18090471
> for how to invoke Streaming Kmeans
>
> Also look at examples/bin/cluster-reuters.sh for the Streaming KMeans
> option.
>
>
> If all that u r looking for his centroids and distances from centroids,
> wouldn't KMeans suffice? It would help if u could provide more details as
> to what u r trying to accomplish here?
>
>
> On Mon, Sep 29, 2014 at 9:55 AM, Marko <ma...@nissatech.com> wrote:
>
>> Hello everyone,
>>
>> I have previously asked a question about Streaming K Means examples, and
>> got an answer that there are not so many available.
>>
>> Can anyone give me example of how to call Streaming K Means clustering for
>> a dataset, and how to get the results?
>>
>> What are the results, are they the same as in basic K Means? Do I get
>> centroids and clustered points? And do I get the distance between point and
>> its centroid, like in K Means?
>>
>> I would like to run Streaming K Means clustering on a dataset, and read in
>> centroids, and also I need the distance from the points to their given
>> centroids. How to do that?
>>
>> Thanks
>>
>
--
Pozdrav,
Marko Dinić