You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by keeyong han <ke...@hotmail.com> on 2013/02/07 01:52:56 UTC

How to dump/interpret CVB output

Hello there,

After some struggle, I managed to run cvb successfully. But I found that dumping the output isn't much easier either. I tried to dump some keywords per cluster by running the following command:

mahout vectordump -i [final_state_output_directory_used_in_cvb_run] -o [output_file_path] --dictionary [dictionary_file_generated_in_vectorization]  --dictionaryType sequencefile --vectorSize 5 --sortVectors true --printKey true 

When I opened the output file, it looked something like these:
0       {�����:22.247111682871502,����:18.373163071757336,���:98.99212990547156,��:381.7630898807104,�:477.31989896222046}
10      {�����:18.69052909454572,����:36.154751708278106,���:128.69867172165564,�:963.769624051711,U:8.647090616806189}
20      {�����:17.571403244328565,����:85.64801880249307,���:78.07377559911669,��:347.51662400027806,�:871.9107248128981}
30      {�����:22.330329037961235,����:35.7514504363204,���:93.79495229393099,��:101.67298391572345,�:560.0330529905118}
40      {�����:7.139737125343593,����:46.70407309589953,���:105.44075086386623,��:350.2449503883152,�:903.5015132966541}

I guess some parameter was missing and/or wrong value was assigned? Please help me. 

BTW I am using Mahout 0.8 on Hadoop 1.0.3.

Cheers,
-Keeyong