You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Necati Demir <nd...@demir.web.tr> on 2012/08/09 23:12:25 UTC

How to find characteristics of the clusters with mahout?

Hello,

I am using mahout 0.8 and after clustering a data, i use this command to
see results:

> mahout clusterdump --seqFileDir clusters/clusters-77/ --pointsDir
> clusters/clusteredPoints/

Also i want to learn why rows are clustered in the same cluster. I think,
to learn this i can write code to find which features/dimensions are
similar in a cluster.

Without writing code, can i find why rows are clustered in the same
cluster?

**In a nutshell: I want to learn the characteristics of the clusters.**


-- 
Necati DEMİR
--------------------

Re: How to find characteristics of the clusters with mahout?

Posted by Kiran Kumar Bushireddy <ki...@gmail.com>.
It depends on the important keywords in each document. Documents having
similar keywords will be mapped to the same cluster. It all depends on
distance calculations. Distance from centroid to each document is
calculated and the closest documents to the centroid forms a cluster.
You can evaluate the cluster by giving parameter -e which will give you
intracluster and intercluster density.

Thanks,
Kiran

On Fri, Aug 10, 2012 at 2:30 AM, Necati Demir <nd...@demir.web.tr> wrote:

> That's right; i want to learn why vectors are being assigned to any
> particular cluster.
> Suppose that each vector represents a person's behaviour. I want to learn
> which behaviour patterns are there in the cluster?
>
> On 10 August 2012 08:06, Paritosh Ranjan <pr...@xebia.com> wrote:
>
> > I think you want to know why vectors are being assigned to any particular
> > cluster.
> > Different clustering algorithms work in different way, so, I think some
> > code will be needed for it.
> >
> > The way I do it, is by taking a small set of vectors, and debug the
> > clustering algorithm using their sequential version.
> > Its fast and makes things clear.
> >
> > There are certain cluster evaluators also, which might help, but I don't
> > know much about them, try to have a look at them also.
> >
> >
> > On 10-08-2012 02:42, Necati Demir wrote:
> >
> >> Hello,
> >>
> >> I am using mahout 0.8 and after clustering a data, i use this command to
> >> see results:
> >>
> >>  mahout clusterdump --seqFileDir clusters/clusters-77/ --pointsDir
> >>> clusters/clusteredPoints/
> >>>
> >> Also i want to learn why rows are clustered in the same cluster. I
> think,
> >> to learn this i can write code to find which features/dimensions are
> >> similar in a cluster.
> >>
> >> Without writing code, can i find why rows are clustered in the same
> >> cluster?
> >>
> >> **In a nutshell: I want to learn the characteristics of the clusters.**
> >>
> >>
> >>
> >
> >
>
>
> --
> Necati DEMİR
> --------------------
>



-- 
Thanks & Regards,
Kiran Kumar

Re: How to find characteristics of the clusters with mahout?

Posted by Paritosh Ranjan <pr...@xebia.com>.
Maybe this can help :

ClusterDumperWriter.getTopFeatures(Vector vector, String[] dictionary, int numTerms)

On 10-08-2012 12:00, Necati Demir wrote:
> That's right; i want to learn why vectors are being assigned to any
> particular cluster.
> Suppose that each vector represents a person's behaviour. I want to learn
> which behaviour patterns are there in the cluster?
>
> On 10 August 2012 08:06, Paritosh Ranjan <pr...@xebia.com> wrote:
>
>> I think you want to know why vectors are being assigned to any particular
>> cluster.
>> Different clustering algorithms work in different way, so, I think some
>> code will be needed for it.
>>
>> The way I do it, is by taking a small set of vectors, and debug the
>> clustering algorithm using their sequential version.
>> Its fast and makes things clear.
>>
>> There are certain cluster evaluators also, which might help, but I don't
>> know much about them, try to have a look at them also.
>>
>>
>> On 10-08-2012 02:42, Necati Demir wrote:
>>
>>> Hello,
>>>
>>> I am using mahout 0.8 and after clustering a data, i use this command to
>>> see results:
>>>
>>>   mahout clusterdump --seqFileDir clusters/clusters-77/ --pointsDir
>>>> clusters/clusteredPoints/
>>>>
>>> Also i want to learn why rows are clustered in the same cluster. I think,
>>> to learn this i can write code to find which features/dimensions are
>>> similar in a cluster.
>>>
>>> Without writing code, can i find why rows are clustered in the same
>>> cluster?
>>>
>>> **In a nutshell: I want to learn the characteristics of the clusters.**
>>>
>>>
>>>
>>
>



Re: How to find characteristics of the clusters with mahout?

Posted by Necati Demir <nd...@demir.web.tr>.
That's right; i want to learn why vectors are being assigned to any
particular cluster.
Suppose that each vector represents a person's behaviour. I want to learn
which behaviour patterns are there in the cluster?

On 10 August 2012 08:06, Paritosh Ranjan <pr...@xebia.com> wrote:

> I think you want to know why vectors are being assigned to any particular
> cluster.
> Different clustering algorithms work in different way, so, I think some
> code will be needed for it.
>
> The way I do it, is by taking a small set of vectors, and debug the
> clustering algorithm using their sequential version.
> Its fast and makes things clear.
>
> There are certain cluster evaluators also, which might help, but I don't
> know much about them, try to have a look at them also.
>
>
> On 10-08-2012 02:42, Necati Demir wrote:
>
>> Hello,
>>
>> I am using mahout 0.8 and after clustering a data, i use this command to
>> see results:
>>
>>  mahout clusterdump --seqFileDir clusters/clusters-77/ --pointsDir
>>> clusters/clusteredPoints/
>>>
>> Also i want to learn why rows are clustered in the same cluster. I think,
>> to learn this i can write code to find which features/dimensions are
>> similar in a cluster.
>>
>> Without writing code, can i find why rows are clustered in the same
>> cluster?
>>
>> **In a nutshell: I want to learn the characteristics of the clusters.**
>>
>>
>>
>
>


-- 
Necati DEMİR
--------------------

Re: How to find characteristics of the clusters with mahout?

Posted by Paritosh Ranjan <pr...@xebia.com>.
I think you want to know why vectors are being assigned to any 
particular cluster.
Different clustering algorithms work in different way, so, I think some 
code will be needed for it.

The way I do it, is by taking a small set of vectors, and debug the 
clustering algorithm using their sequential version.
Its fast and makes things clear.

There are certain cluster evaluators also, which might help, but I don't 
know much about them, try to have a look at them also.

On 10-08-2012 02:42, Necati Demir wrote:
> Hello,
>
> I am using mahout 0.8 and after clustering a data, i use this command to
> see results:
>
>> mahout clusterdump --seqFileDir clusters/clusters-77/ --pointsDir
>> clusters/clusteredPoints/
> Also i want to learn why rows are clustered in the same cluster. I think,
> to learn this i can write code to find which features/dimensions are
> similar in a cluster.
>
> Without writing code, can i find why rows are clustered in the same
> cluster?
>
> **In a nutshell: I want to learn the characteristics of the clusters.**
>
>