You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Christoph Hermann <ch...@guschtel.de> on 2010/01/11 17:30:04 UTC

Clustering Items

Hello,

i have never used Mahout before and before i invest too much time 
reading api and source code i thought that i maybe get some pointers 
from you.

I have several Objects containing 1..n attributes (actually long/double 
values). I want to cluster these Objects to get Clusters of similar 
Objects regardings those n attributes.
Then i want to be able to look up in which cluster my object is and 
which other objects also belong to this cluster.

I thought that such a clustering would be possible using the Mean Shift 
from Mahout (since i don't know how many clusters i will have in 
advance, else i would probably use k-means).

So what i have to do is transform these Objects to VectorS and then 
cluster them using MeanShiftCanopy and some distance measure (probably 
EuclideanDistanceMeasure at the beginning).

foo = new DenseVector(new double[]{ val1, ..., valn});
and then basically follow what is done in testReferenceImplementation() 
of the DisplayMeanShift class (My entry point is the DisplayMeanShift 
class so far.).

Is that correct? Is there any other example doing something similar i 
could look at?

Any additional pointers are welcome - i already read the IBM article 
from Grant Ingersoll.

regards
Christoph Hermann

Re: Clustering Items

Posted by Ted Dunning <te...@gmail.com>.

k-means is more used.  You might consider running k-means several times.

There is also the Dirichlet process clustering which is a bit tweakier than
k-means, but it can infer the number of clusters for you.

On Mon, Jan 11, 2010 at 8:30 AM, Christoph Hermann <
christoph.hermann@guschtel.de> wrote:

> Hello,
>
> i have never used Mahout before and before i invest too much time
> reading api and source code i thought that i maybe get some pointers
> from you.
>
> I have several Objects containing 1..n attributes (actually long/double
> values). I want to cluster these Objects to get Clusters of similar
> Objects regardings those n attributes.
> Then i want to be able to look up in which cluster my object is and
> which other objects also belong to this cluster.
>
> I thought that such a clustering would be possible using the Mean Shift
> from Mahout (since i don't know how many clusters i will have in
> advance, else i would probably use k-means).
>
> So what i have to do is transform these Objects to VectorS and then
> cluster them using MeanShiftCanopy and some distance measure (probably
> EuclideanDistanceMeasure at the beginning).
>
> foo = new DenseVector(new double[]{ val1, ..., valn});
> and then basically follow what is done in testReferenceImplementation()
> of the DisplayMeanShift class (My entry point is the DisplayMeanShift
> class so far.).
>
> Is that correct? Is there any other example doing something similar i
> could look at?
>
> Any additional pointers are welcome - i already read the IBM article
> from Grant Ingersoll.
>
> regards
> Christoph Hermann
>



-- 
Ted Dunning, CTO
DeepDyve