You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Wei Xia <su...@yahoo.com.INVALID> on 2014/11/04 11:47:46 UTC

Using Clusterdump cmd having an error [mahout0.9 on hadoop2.3]

Hi, this prolem troubles me for couples of days, I sitll can not find any reason, is there any one can HELP?

Here is my cmd :


mahout clusterdump -i video_tags_kmean_job/clusters/clusters-10-final -o ~/video_tags_clusters_dump -p video_tags_kmean_job/clusters/clusteredPoints -dt sequencefile -d video_tags_kmean_job/vectors/dictionary.file-0 -n 50





First, I tried this cmd with the same data on a single VM, and there is no prolem :)

and then, I tried this on a real web server cluster, and things below happened :(




15:54 [username@servername]$ mahout clusterdump -i video_tags_kmean_job/clusters/clusters-10-final -o ~/video_tags_clusters_dump -p video_tags_kmean_job/clusters/clusteredPoints -dt sequencefile -d video_tags_kmean_job/vectors/dictionary.file-0 -n 50

MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.

Running on hadoop, using /opt/hadoop/default/bin/hadoop and HADOOP_CONF_DIR=/opt/hadoop/default/etc/hadoop

MAHOUT-JOB: /home/username/apps/mahout-distribution-0.9/mahout-examples-0.9-job.jar

14/11/04 15:55:30 ERROR common.AbstractJob: Unexpected sequencefile while processing Job-Specific Options:

Unexpected sequencefile while processing Job-Specific Options:                  

Usage:                                                                          

 [--input <input> --output <output> --outputFormat <outputFormat> --substring   

<substring> --numWords <numWords> --pointsDir <pointsDir> --samplePoints        

<samplePoints> --dictionary <dictionary> --dictionaryType <dictionaryType>      

--evaluate --distanceMeasure <distanceMeasure> --help --tempDir <tempDir>       

--startPhase <startPhase> --endPhase <endPhase>]                                

Job-Specific Options:                                                           

  --input (-i) input                         Path to job input directory.       

  --output (-o) output                       The directory pathname for output. 

  --outputFormat (-of) outputFormat          The optional output format for the 

                                             results.  Options: TEXT, CSV, JSON 

                                             or GRAPH_ML                        

  --substring (-b) substring                 The number of chars of the         

                                             asFormatString() to print          

  --numWords (-n) numWords                   The number of top terms to print   

  --pointsDir (-p) pointsDir                 The directory containing points    

                                             sequence files mapping input       

                                             vectors to their cluster.  If      

                                             specified, then the program will   

                                             output the points associated with  

                                             a cluster                          

  --samplePoints (-sp) samplePoints          Specifies the maximum number of    

                                             points to include _per_ cluster.   

                                             The default is to include all      

                                             points                             

  --dictionary (-d) dictionary               The dictionary file                

  --dictionaryType (-dt) dictionaryType      The dictionary file type           

                                             (text|sequencefile)                

  --evaluate (-e)                            Run ClusterEvaluator and           

                                             CDbwEvaluator over the input.  The 

                                             output will be appended to the     

                                             rest of the output at the end.     

  --distanceMeasure (-dm) distanceMeasure    The classname of the               

                                             DistanceMeasure. Default is        

                                             SquaredEuclidean                   

  --help (-h)                                Print out help                     

  --tempDir tempDir                          Intermediate output directory      

  --startPhase startPhase                    First phase to run                 

  --endPhase endPhase                        Last phase to run                  

14/11/04 15:55:31 INFO driver.MahoutDriver: Program took 439 ms (Minutes: 0.007316666666666667)




PS: I configed mahout in /home/username/.bashrc:




# set mahout path

export MAHOUT_HOME=/home/username/apps/mahout-distribution-0.9

export MAHOUT_LOCAL=

export PATH=$PATH:$MAHOUT_HOME/bin

export CLASSPATH=$CLASSPATH:$MAHOUT_HOME/mahout-core-0.9.jar:$MAHOUT_HOME/mahout-math-0.9.jar:$MAHOUT_HOME/mahout-integration-0.9.jar







Thanks a lot

Spike