You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Christopher Laux <ct...@gmail.com> on 2012/11/28 14:53:31 UTC

Empty clusteredPoints after Dirichlet clustering

Hi all,

I've run Dirichlet Clustering but the clustered points output is empty.
Specifically clusteredPoints/part-m-00000 and -00001 exist but both files
are empty Sequence files (length 120 bytes). The clusters (directories
cluster-n) themselves are filled.

Any hints as to what caused this?

Thanks,
Chris

Re: Empty clusteredPoints after Dirichlet clustering

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
The classification phase of Dirichlet uses a most-likely assignment of 
points to clusters by default. This means that, unlike the training 
phase where points are assigned statistically to likely clusters, the 
classification may result in empty clusters even though those clusters 
have nonzero counts in the final iteration. You can disable most-likely 
assignment and set a pdf threshold - check the documentation - and 
points will be classified to all of the clusters that have pdf greater 
than the threshold.

On 11/28/12 8:53 AM, Christopher Laux wrote:
> Hi all,
>
> I've run Dirichlet Clustering but the clustered points output is empty.
> Specifically clusteredPoints/part-m-00000 and -00001 exist but both files
> are empty Sequence files (length 120 bytes). The clusters (directories
> cluster-n) themselves are filled.
>
> Any hints as to what caused this?
>
> Thanks,
> Chris
>