You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Jerry Ye <je...@yahoo-inc.com> on 2010/01/27 04:29:43 UTC

Viewing clustering results for Dirichlet Process Clustering

I'm trying to view the output of my experiment using Dirichlet Process Clustering.  When attempting to use the ClusterDumper utility on the output directory, an exception is thrown.  Upon looking closer, DirichletCluster does not extend ClusterBase.  The error is below.

Is there some other way that I can view the cluster labels?

Thanks!

- jerry

-bash-3.1$ java -cp mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo dependency/*.jar . | sed 's/ /:/g') org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
Exception in thread "main" java.lang.ClassCastException: org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to org.apache.mahout.clustering.ClusterBase
    at org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
    at org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)

Re: Viewing clustering results for Dirichlet Process Clustering

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
I agree, but this will require an API extension to Model, as I suggested 
below, because each model type has its own parameters that need to be 
represented. I'll open a Jira for it.

Jeff

Grant Ingersoll wrote:
> We probably should have ClusterDumper still handle Dirichlet jobs, so that users don't need to deal w/ more than one interface.  
>
>
> On Jan 26, 2010, at 11:25 PM, Jeff Eastman wrote:
>
>   
>> Hi Jerry,
>>
>> DirichletClusters are not similar enough to ClusterBase to make that workable, so you are correct that the utility won't dump them. Writing a dump utility that can is a great idea, though it does tend to be rather Model specific. Maybe Models should have some printable representation a-la asFormatString().
>>
>> Look at the code in
>>
>> /MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
>> /MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java
>>
>> for ideas on how you might be able to dump out your DirichletClusters and their Models.
>>
>> I've actually considered making ClusterBase into a Model and generalizing DirichletCluster to be the root of all clusters. I think the distance measures used by canopy and k-means could be cast as Model pdfs but the whole idea is still only half-baked.
>>
>> Jeff
>>
>> Jerry Ye wrote:
>>     
>>> I'm trying to view the output of my experiment using Dirichlet Process Clustering.  When attempting to use the ClusterDumper utility on the output directory, an exception is thrown.  Upon looking closer, DirichletCluster does not extend ClusterBase.  The error is below.
>>>
>>> Is there some other way that I can view the cluster labels?
>>>
>>> Thanks!
>>>
>>> - jerry
>>>
>>> -bash-3.1$ java -cp mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo dependency/*.jar . | sed 's/ /:/g') org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
>>> Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
>>> Exception in thread "main" java.lang.ClassCastException: org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to org.apache.mahout.clustering.ClusterBase
>>>    at org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
>>>    at org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)
>>>
>>>  
>>>       
>
> --------------------------
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search
>
>
>   


Re: Viewing clustering results for Dirichlet Process Clustering

Posted by Grant Ingersoll <gs...@apache.org>.
We probably should have ClusterDumper still handle Dirichlet jobs, so that users don't need to deal w/ more than one interface.  


On Jan 26, 2010, at 11:25 PM, Jeff Eastman wrote:

> Hi Jerry,
> 
> DirichletClusters are not similar enough to ClusterBase to make that workable, so you are correct that the utility won't dump them. Writing a dump utility that can is a great idea, though it does tend to be rather Model specific. Maybe Models should have some printable representation a-la asFormatString().
> 
> Look at the code in
> 
> /MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
> /MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java
> 
> for ideas on how you might be able to dump out your DirichletClusters and their Models.
> 
> I've actually considered making ClusterBase into a Model and generalizing DirichletCluster to be the root of all clusters. I think the distance measures used by canopy and k-means could be cast as Model pdfs but the whole idea is still only half-baked.
> 
> Jeff
> 
> Jerry Ye wrote:
>> I'm trying to view the output of my experiment using Dirichlet Process Clustering.  When attempting to use the ClusterDumper utility on the output directory, an exception is thrown.  Upon looking closer, DirichletCluster does not extend ClusterBase.  The error is below.
>> 
>> Is there some other way that I can view the cluster labels?
>> 
>> Thanks!
>> 
>> - jerry
>> 
>> -bash-3.1$ java -cp mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo dependency/*.jar . | sed 's/ /:/g') org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
>> Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
>> Exception in thread "main" java.lang.ClassCastException: org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to org.apache.mahout.clustering.ClusterBase
>>    at org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
>>    at org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)
>> 
>>  
> 

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem using Solr/Lucene: http://www.lucidimagination.com/search


Re: Viewing clustering results for Dirichlet Process Clustering

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Hi Jerry,

DirichletClusters are not similar enough to ClusterBase to make that 
workable, so you are correct that the utility won't dump them. Writing a 
dump utility that can is a great idea, though it does tend to be rather 
Model specific. Maybe Models should have some printable representation 
a-la asFormatString().

Look at the code in

 /MahoutTrunk/utils/src/test/java/org/apache/mahout/clustering/dirichlet/TestL1ModelClustering.java
 /MahoutTrunk/examples/src/main/java/org/apache/mahout/clustering/dirichlet/DisplayOutputState.java

 for ideas on how you might be able to dump out your DirichletClusters 
and their Models.

I've actually considered making ClusterBase into a Model and 
generalizing DirichletCluster to be the root of all clusters. I think 
the distance measures used by canopy and k-means could be cast as Model 
pdfs but the whole idea is still only half-baked.

Jeff

Jerry Ye wrote:
> I'm trying to view the output of my experiment using Dirichlet Process Clustering.  When attempting to use the ClusterDumper utility on the output directory, an exception is thrown.  Upon looking closer, DirichletCluster does not extend ClusterBase.  The error is below.
>
> Is there some other way that I can view the cluster labels?
>
> Thanks!
>
> - jerry
>
> -bash-3.1$ java -cp mahout-core-0.3-SNAPSHOT.jar:mahout-utils-0.3-SNAPSHOT.jar:$( echo dependency/*.jar . | sed 's/ /:/g') org.apache.mahout.utils.clustering.ClusterDumper -s mahoutout/state-0
> Input Path: /homes/jerryye/mahout/mahoutout/state-0/part-0
> Exception in thread "main" java.lang.ClassCastException: org.apache.mahout.clustering.dirichlet.DirichletCluster cannot be cast to org.apache.mahout.clustering.ClusterBase
>     at org.apache.mahout.utils.clustering.ClusterDumper.printClusters(ClusterDumper.java:119)
>     at org.apache.mahout.utils.clustering.ClusterDumper.main(ClusterDumper.java:251)
>
>