You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Videnova, Svetlana" <sv...@logica.com> on 2012/08/06 16:33:39 UTC

ClusterDumper eclipse human readable output kmeans

Hi,

My goal is to transform the vectors created by lucene.vector (thanks to kmeans clustering) to a human readable format. For that I am using ClusterDumper function on eclipse. But that code does not generate none files. What am I missing? What is the best approach to transform output of kmeans to a human readable (no unix command please I am on windows using eclipse and cygwin).
This is the code:


Code :

Map<Integer, List<WeightedVectorWritable>> result = ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, conf);

            System.out.println(result.get(0).toString());
            for(int j = 0; j < result.size(); j++){
                  List<WeightedVectorWritable> list = result.get(j);
                  for(WeightedVectorWritable vector : list){
                        System.out.println(vector.getVector().asFormatString());
                  }

            }


Error :

Exception in thread "main" java.lang.ClassCastException: org.apache.mahout.clustering.iterator.ClusterWritable cannot be cast to org.apache.mahout.clustering.classify.WeightedVectorWritable
      at main.LuceneDemo.main(LuceneDemo.java:260)



Thank you


Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.


RE: ClusterDumper eclipse human readable output kmeans

Posted by "Videnova, Svetlana" <sv...@logica.com>.
Just succeed to make work my app. Should to use ClusterDumperWriter.gettopfeatures(ar1,arg2,arg3) and that gave me the top words on human readable format :D



-----Message d'origine-----
De : Paritosh Ranjan [mailto:pranjan@xebia.com] 
Envoyé : mardi 7 août 2012 10:32
À : user@mahout.apache.org
Objet : Re: ClusterDumper eclipse human readable output kmeans

I don't know why ClusterDumper is not working, but I can give an alternate solution.

Use ClusterOutputPostProcessor  (clusterpp), on the clusters-*-final directory. https://cwiki.apache.org/MAHOUT/top-down-clustering.html
It will arrange the vectors in respective directories. However, it will still be in the form of sequence files.

Its very simple to read a sequence file and write in a human readable format.

Classes in org.apache.mahout.common.iterator.sequencefile package can help to read the sequence files easily.

On 07-08-2012 12:50, Videnova, Svetlana wrote:
> I already generated points directory when i run cluster (kmeans in my case).
> But for the moment I can't generate clustedump because of error on this line:
> ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, 
> conf); Second parameter is double but he wants int but does not accept int .... well pretty confused ...
>
>
>
> -----Message d'origine-----
> De : kiran kumar [mailto:kirankumarsmail@gmail.com]
> Envoyé : lundi 6 août 2012 18:01
> À : user@mahout.apache.org
> Objet : Re: ClusterDumper eclipse human readable output kmeans
>
> Hello,
> Clusterdump actually shows you the top terms and vectors of centroid and each document. But to identify what vectors are for your document, You need to generate points directory when running clustering algorithm and use the points directory generated in the above step when generating cluster dump.
>
> Thanks,
> Kiran Bushireddy.
>
> On Mon, Aug 6, 2012 at 10:33 AM, Videnova, Svetlana < svetlana.videnova@logica.com> wrote:
>
>> Hi,
>>
>> My goal is to transform the vectors created by lucene.vector (thanks 
>> to kmeans clustering) to a human readable format. For that I am using 
>> ClusterDumper function on eclipse. But that code does not generate 
>> none files. What am I missing? What is the best approach to transform 
>> output of kmeans to a human readable (no unix command please I am on 
>> windows using eclipse and cygwin).
>> This is the code:
>>
>>
>> Code :
>>
>> Map<Integer, List<WeightedVectorWritable>> result = 
>> ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, 
>> conf);
>>
>>              System.out.println(result.get(0).toString());
>>              for(int j = 0; j < result.size(); j++){
>>                    List<WeightedVectorWritable> list = result.get(j);
>>                    for(WeightedVectorWritable vector : list){
>>
>> System.out.println(vector.getVector().asFormatString());
>>                    }
>>
>>              }
>>
>>
>> Error :
>>
>> Exception in thread "main" java.lang.ClassCastException:
>> org.apache.mahout.clustering.iterator.ClusterWritable cannot be cast 
>> to org.apache.mahout.clustering.classify.WeightedVectorWritable
>>        at main.LuceneDemo.main(LuceneDemo.java:260)
>>
>>
>>
>> Thank you
>>
>>
>> Think green - keep it on the screen.
>>
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. It may contain proprietary material, confidential 
>> information and/or be subject to legal privilege. It should not be 
>> copied, disclosed to, retained or used by, any other party. If you 
>> are not an intended recipient then please promptly delete this e-mail 
>> and any attachment and all copies and inform the sender. Thank you.
>>
>>
>
> --
> Thanks & Regards,
> Kiran Kumar
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>




Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.



Re: ClusterDumper eclipse human readable output kmeans

Posted by Paritosh Ranjan <pr...@xebia.com>.
I don't know why ClusterDumper is not working, but I can give an 
alternate solution.

Use ClusterOutputPostProcessor  (clusterpp), on the clusters-*-final 
directory. https://cwiki.apache.org/MAHOUT/top-down-clustering.html
It will arrange the vectors in respective directories. However, it will 
still be in the form of sequence files.

Its very simple to read a sequence file and write in a human readable 
format.

Classes in org.apache.mahout.common.iterator.sequencefile package can 
help to read the sequence files easily.

On 07-08-2012 12:50, Videnova, Svetlana wrote:
> I already generated points directory when i run cluster (kmeans in my case).
> But for the moment I can't generate clustedump because of error on this line:
> ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, conf);
> Second parameter is double but he wants int but does not accept int .... well pretty confused ...
>
>
>
> -----Message d'origine-----
> De : kiran kumar [mailto:kirankumarsmail@gmail.com]
> Envoyé : lundi 6 août 2012 18:01
> À : user@mahout.apache.org
> Objet : Re: ClusterDumper eclipse human readable output kmeans
>
> Hello,
> Clusterdump actually shows you the top terms and vectors of centroid and each document. But to identify what vectors are for your document, You need to generate points directory when running clustering algorithm and use the points directory generated in the above step when generating cluster dump.
>
> Thanks,
> Kiran Bushireddy.
>
> On Mon, Aug 6, 2012 at 10:33 AM, Videnova, Svetlana < svetlana.videnova@logica.com> wrote:
>
>> Hi,
>>
>> My goal is to transform the vectors created by lucene.vector (thanks
>> to kmeans clustering) to a human readable format. For that I am using
>> ClusterDumper function on eclipse. But that code does not generate
>> none files. What am I missing? What is the best approach to transform
>> output of kmeans to a human readable (no unix command please I am on
>> windows using eclipse and cygwin).
>> This is the code:
>>
>>
>> Code :
>>
>> Map<Integer, List<WeightedVectorWritable>> result =
>> ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2,
>> conf);
>>
>>              System.out.println(result.get(0).toString());
>>              for(int j = 0; j < result.size(); j++){
>>                    List<WeightedVectorWritable> list = result.get(j);
>>                    for(WeightedVectorWritable vector : list){
>>
>> System.out.println(vector.getVector().asFormatString());
>>                    }
>>
>>              }
>>
>>
>> Error :
>>
>> Exception in thread "main" java.lang.ClassCastException:
>> org.apache.mahout.clustering.iterator.ClusterWritable cannot be cast
>> to org.apache.mahout.clustering.classify.WeightedVectorWritable
>>        at main.LuceneDemo.main(LuceneDemo.java:260)
>>
>>
>>
>> Thank you
>>
>>
>> Think green - keep it on the screen.
>>
>> This e-mail and any attachment is for authorised use by the intended
>> recipient(s) only. It may contain proprietary material, confidential
>> information and/or be subject to legal privilege. It should not be
>> copied, disclosed to, retained or used by, any other party. If you are
>> not an intended recipient then please promptly delete this e-mail and
>> any attachment and all copies and inform the sender. Thank you.
>>
>>
>
> --
> Thanks & Regards,
> Kiran Kumar
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.
>
>



RE: ClusterDumper eclipse human readable output kmeans

Posted by "Videnova, Svetlana" <sv...@logica.com>.
I already generated points directory when i run cluster (kmeans in my case).
But for the moment I can't generate clustedump because of error on this line:
ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, conf);
Second parameter is double but he wants int but does not accept int .... well pretty confused ...



-----Message d'origine-----
De : kiran kumar [mailto:kirankumarsmail@gmail.com] 
Envoyé : lundi 6 août 2012 18:01
À : user@mahout.apache.org
Objet : Re: ClusterDumper eclipse human readable output kmeans

Hello,
Clusterdump actually shows you the top terms and vectors of centroid and each document. But to identify what vectors are for your document, You need to generate points directory when running clustering algorithm and use the points directory generated in the above step when generating cluster dump.

Thanks,
Kiran Bushireddy.

On Mon, Aug 6, 2012 at 10:33 AM, Videnova, Svetlana < svetlana.videnova@logica.com> wrote:

> Hi,
>
> My goal is to transform the vectors created by lucene.vector (thanks 
> to kmeans clustering) to a human readable format. For that I am using 
> ClusterDumper function on eclipse. But that code does not generate 
> none files. What am I missing? What is the best approach to transform 
> output of kmeans to a human readable (no unix command please I am on 
> windows using eclipse and cygwin).
> This is the code:
>
>
> Code :
>
> Map<Integer, List<WeightedVectorWritable>> result = 
> ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, 
> conf);
>
>             System.out.println(result.get(0).toString());
>             for(int j = 0; j < result.size(); j++){
>                   List<WeightedVectorWritable> list = result.get(j);
>                   for(WeightedVectorWritable vector : list){
>
> System.out.println(vector.getVector().asFormatString());
>                   }
>
>             }
>
>
> Error :
>
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.mahout.clustering.iterator.ClusterWritable cannot be cast 
> to org.apache.mahout.clustering.classify.WeightedVectorWritable
>       at main.LuceneDemo.main(LuceneDemo.java:260)
>
>
>
> Thank you
>
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential 
> information and/or be subject to legal privilege. It should not be 
> copied, disclosed to, retained or used by, any other party. If you are 
> not an intended recipient then please promptly delete this e-mail and 
> any attachment and all copies and inform the sender. Thank you.
>
>


--
Thanks & Regards,
Kiran Kumar

Think green - keep it on the screen.

This e-mail and any attachment is for authorised use by the intended recipient(s) only. It may contain proprietary material, confidential information and/or be subject to legal privilege. It should not be copied, disclosed to, retained or used by, any other party. If you are not an intended recipient then please promptly delete this e-mail and any attachment and all copies and inform the sender. Thank you.



Re: ClusterDumper eclipse human readable output kmeans

Posted by kiran kumar <ki...@gmail.com>.
Hello,
Clusterdump actually shows you the top terms and vectors of centroid and
each document. But to identify what vectors are for your document, You need
to generate points directory when running clustering algorithm and use the
points directory generated in the above step when generating cluster dump.

Thanks,
Kiran Bushireddy.

On Mon, Aug 6, 2012 at 10:33 AM, Videnova, Svetlana <
svetlana.videnova@logica.com> wrote:

> Hi,
>
> My goal is to transform the vectors created by lucene.vector (thanks to
> kmeans clustering) to a human readable format. For that I am using
> ClusterDumper function on eclipse. But that code does not generate none
> files. What am I missing? What is the best approach to transform output of
> kmeans to a human readable (no unix command please I am on windows using
> eclipse and cygwin).
> This is the code:
>
>
> Code :
>
> Map<Integer, List<WeightedVectorWritable>> result =
> ClusterDumper.readPoints(new Path("output/kmeans/clusters-0"), 2, conf);
>
>             System.out.println(result.get(0).toString());
>             for(int j = 0; j < result.size(); j++){
>                   List<WeightedVectorWritable> list = result.get(j);
>                   for(WeightedVectorWritable vector : list){
>
> System.out.println(vector.getVector().asFormatString());
>                   }
>
>             }
>
>
> Error :
>
> Exception in thread "main" java.lang.ClassCastException:
> org.apache.mahout.clustering.iterator.ClusterWritable cannot be cast to
> org.apache.mahout.clustering.classify.WeightedVectorWritable
>       at main.LuceneDemo.main(LuceneDemo.java:260)
>
>
>
> Thank you
>
>
> Think green - keep it on the screen.
>
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. It may contain proprietary material, confidential
> information and/or be subject to legal privilege. It should not be copied,
> disclosed to, retained or used by, any other party. If you are not an
> intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender. Thank you.
>
>


-- 
Thanks & Regards,
Kiran Kumar