You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by yoshihiro fujimoto <yo...@gmail.com> on 2012/12/25 05:57:44 UTC
About Dirichlet clustering's threshold
Hi all,
https://cwiki.apache.org/MAHOUT/dirichlet-process-clustering.html
According to this page, it can specify threshold to Dirichlet Driver.
This page explain that threshold of 0 will emit all clusters with their
associated probabilities for each vector.
So, I've run Dirichlet Clustering using threshold 0.
But, clusteredPoints/part-m-00000 sequence file is empty( length is 120
byte).
In Dirichlet Process, is there a case of empty result using threshold 0?
Thanks,
Yoshihiro
Re: About Dirichlet clustering's threshold
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
It could be a contradiction indeed. I wonder if you can help us to
characterize it further, perhaps by reading the code or by running your
data in sequential debug mode? Without a little more information it is
difficult to get to the root of your problem.
On 12/25/12 8:21 PM, yoshihiro fujimoto wrote:
> Hi Jeff.
>
>> Did you turn off most-likely classification?
> Yes, I specified most-likely option to false.
> In general, pdf's range is between 0 and 1.
> So, if pdf threshold is specified 0, all points classified to all of the
> clusters.
> Actually, sequence file is empty.
>
> I feel contradiction.
> I may be wrong but this is bug?
>
> Thanks,
> Yoshihiro.
>
>
>
> 2012/12/26 Jeff Eastman <jd...@windwardsolutions.com>
>
>> Here's a response to a similar question from a couple of months ago:
>>
>> The classification phase of Dirichlet uses a most-likely assignment of
>> points to clusters by default. This means that, unlike the training phase
>> where points are assigned statistically to likely clusters, the
>> classification may result in empty clusters even though those clusters have
>> nonzero counts in the final iteration. You can disable most-likely
>> assignment and set a pdf threshold - check the documentation - and points
>> will be classified to all of the clusters that have pdf greater than the
>> threshold.
>>
>> Does this help? Did you turn off most-likely classification?
>> Jeff
>>
>>
>> On 12/24/12 11:57 PM, yoshihiro fujimoto wrote:
>>
>>> Hi all,
>>>
>>>
>>> https://cwiki.apache.org/**MAHOUT/dirichlet-process-**clustering.html<https://cwiki.apache.org/MAHOUT/dirichlet-process-clustering.html>
>>>
>>> According to this page, it can specify threshold to Dirichlet Driver.
>>> This page explain that threshold of 0 will emit all clusters with their
>>> associated probabilities for each vector.
>>> So, I've run Dirichlet Clustering using threshold 0.
>>> But, clusteredPoints/part-m-00000 sequence file is empty( length is 120
>>> byte).
>>>
>>> In Dirichlet Process, is there a case of empty result using threshold 0?
>>>
>>> Thanks,
>>>
>>> Yoshihiro
>>>
>>>
Re: About Dirichlet clustering's threshold
Posted by yoshihiro fujimoto <yo...@gmail.com>.
Hi Jeff.
> Did you turn off most-likely classification?
Yes, I specified most-likely option to false.
In general, pdf's range is between 0 and 1.
So, if pdf threshold is specified 0, all points classified to all of the
clusters.
Actually, sequence file is empty.
I feel contradiction.
I may be wrong but this is bug?
Thanks,
Yoshihiro.
2012/12/26 Jeff Eastman <jd...@windwardsolutions.com>
> Here's a response to a similar question from a couple of months ago:
>
> The classification phase of Dirichlet uses a most-likely assignment of
> points to clusters by default. This means that, unlike the training phase
> where points are assigned statistically to likely clusters, the
> classification may result in empty clusters even though those clusters have
> nonzero counts in the final iteration. You can disable most-likely
> assignment and set a pdf threshold - check the documentation - and points
> will be classified to all of the clusters that have pdf greater than the
> threshold.
>
> Does this help? Did you turn off most-likely classification?
> Jeff
>
>
> On 12/24/12 11:57 PM, yoshihiro fujimoto wrote:
>
>> Hi all,
>>
>>
>> https://cwiki.apache.org/**MAHOUT/dirichlet-process-**clustering.html<https://cwiki.apache.org/MAHOUT/dirichlet-process-clustering.html>
>>
>> According to this page, it can specify threshold to Dirichlet Driver.
>> This page explain that threshold of 0 will emit all clusters with their
>> associated probabilities for each vector.
>> So, I've run Dirichlet Clustering using threshold 0.
>> But, clusteredPoints/part-m-00000 sequence file is empty( length is 120
>> byte).
>>
>> In Dirichlet Process, is there a case of empty result using threshold 0?
>>
>> Thanks,
>>
>> Yoshihiro
>>
>>
>
Re: About Dirichlet clustering's threshold
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Here's a response to a similar question from a couple of months ago:
The classification phase of Dirichlet uses a most-likely assignment of
points to clusters by default. This means that, unlike the training
phase where points are assigned statistically to likely clusters, the
classification may result in empty clusters even though those clusters
have nonzero counts in the final iteration. You can disable most-likely
assignment and set a pdf threshold - check the documentation - and
points will be classified to all of the clusters that have pdf greater
than the threshold.
Does this help? Did you turn off most-likely classification?
Jeff
On 12/24/12 11:57 PM, yoshihiro fujimoto wrote:
> Hi all,
>
>
> https://cwiki.apache.org/MAHOUT/dirichlet-process-clustering.html
>
> According to this page, it can specify threshold to Dirichlet Driver.
> This page explain that threshold of 0 will emit all clusters with their
> associated probabilities for each vector.
> So, I've run Dirichlet Clustering using threshold 0.
> But, clusteredPoints/part-m-00000 sequence file is empty( length is 120
> byte).
>
> In Dirichlet Process, is there a case of empty result using threshold 0?
>
> Thanks,
>
> Yoshihiro
>