You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Sebastian Briesemeister <se...@unister-gmbh.de> on 2013/03/22 15:39:36 UTC

Retrieving Fuzzy Cluster Probabilities

Dear all,

I am facing troubles when retrieving the cluster probabilities of instances:

I am clustering instances using the FuzzyKMeansDriver.
Afterwards, I am reading instances of WeightedVectorWritable from the
clusteredPoints file (e.g. part-m-0).

1.)
When I am clustering in a sequential manner (no map-reduce),  the
weights of the vectors are reasonable probabilities for the clusters.
However, when I am running FuzzyKMeansDriver with sequential=false, the
weight of each vector equals one for EVERY cluster. So the weights do
not even sum up to 1.

Am I doing something wrong here?


2.)
I tried to circumvent the problem, by using the FuzzyKMeansClusterer:
After clustering, I retrieved the final clusters (Class Cluster) and
calculated the distance of every instance to each of the cluster
centers. Then I calculated the probabilities for each cluster using the
computeProbWeight method of FuzzyKMeansClusterer.

Interestingly, these probabilities differ from the probabilities I get
from the WeightedVectorWritable instances in the clusteredPoints file
when clustering with sequential=true.

Why is there a difference between the vector weights and the pdfs??

Thank you all in advance,
Sebastian


Re: Retrieving Fuzzy Cluster Probabilities

Posted by Sebastian Briesemeister <se...@unister.de>.
I do not see any relationship between the cluster weight vector and the pdf vector. Both are normalized to one. The pdf vector is closer to a uniform distribution than the weight vector from the clustered points file. Both vectors exhibit a maximum for the same cluster. Besides from this, there is no common ground...?? 

Best regards 
Sebastian 



Jeff Eastman <jd...@windwardsolutions.com> schrieb:

>On 3/22/13 10:39 AM, Sebastian Briesemeister wrote:
>> Dear all,
>>
>> I am facing troubles when retrieving the cluster probabilities of
>instances:
>>
>> I am clustering instances using the FuzzyKMeansDriver.
>> Afterwards, I am reading instances of WeightedVectorWritable from the
>> clusteredPoints file (e.g. part-m-0).
>>
>> 1.)
>> When I am clustering in a sequential manner (no map-reduce),  the
>> weights of the vectors are reasonable probabilities for the clusters.
>> However, when I am running FuzzyKMeansDriver with sequential=false,
>the
>> weight of each vector equals one for EVERY cluster. So the weights do
>> not even sum up to 1.
>>
>> Am I doing something wrong here?
>It sounds like you may have found a bug in the MR version. Those 
>probabilities should be the same.
>>
>>
>> 2.)
>> I tried to circumvent the problem, by using the FuzzyKMeansClusterer:
>> After clustering, I retrieved the final clusters (Class Cluster) and
>> calculated the distance of every instance to each of the cluster
>> centers. Then I calculated the probabilities for each cluster using
>the
>> computeProbWeight method of FuzzyKMeansClusterer.
>>
>> Interestingly, these probabilities differ from the probabilities I
>get
>> from the WeightedVectorWritable instances in the clusteredPoints file
>> when clustering with sequential=true.
>>
>> Why is there a difference between the vector weights and the pdfs??
>The pdf vectors are normalized I believe
>>
>> Thank you all in advance,
>> Sebastian
>>
>>
>>

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Re: Retrieving Fuzzy Cluster Probabilities

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
On 3/22/13 10:39 AM, Sebastian Briesemeister wrote:
> Dear all,
>
> I am facing troubles when retrieving the cluster probabilities of instances:
>
> I am clustering instances using the FuzzyKMeansDriver.
> Afterwards, I am reading instances of WeightedVectorWritable from the
> clusteredPoints file (e.g. part-m-0).
>
> 1.)
> When I am clustering in a sequential manner (no map-reduce),  the
> weights of the vectors are reasonable probabilities for the clusters.
> However, when I am running FuzzyKMeansDriver with sequential=false, the
> weight of each vector equals one for EVERY cluster. So the weights do
> not even sum up to 1.
>
> Am I doing something wrong here?
It sounds like you may have found a bug in the MR version. Those 
probabilities should be the same.
>
>
> 2.)
> I tried to circumvent the problem, by using the FuzzyKMeansClusterer:
> After clustering, I retrieved the final clusters (Class Cluster) and
> calculated the distance of every instance to each of the cluster
> centers. Then I calculated the probabilities for each cluster using the
> computeProbWeight method of FuzzyKMeansClusterer.
>
> Interestingly, these probabilities differ from the probabilities I get
> from the WeightedVectorWritable instances in the clusteredPoints file
> when clustering with sequential=true.
>
> Why is there a difference between the vector weights and the pdfs??
The pdf vectors are normalized I believe
>
> Thank you all in advance,
> Sebastian
>
>
>