You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Wei Xia <su...@yahoo.com.INVALID> on 2014/11/04 11:47:46 UTC
Using Clusterdump cmd having an error [mahout0.9 on hadoop2.3]
Hi, this prolem troubles me for couples of days, I sitll can not find any reason, is there any one can HELP?
Here is my cmd :
mahout clusterdump -i video_tags_kmean_job/clusters/clusters-10-final -o ~/video_tags_clusters_dump -p video_tags_kmean_job/clusters/clusteredPoints -dt sequencefile -d video_tags_kmean_job/vectors/dictionary.file-0 -n 50
First, I tried this cmd with the same data on a single VM, and there is no prolem :)
and then, I tried this on a real web server cluster, and things below happened :(
15:54 [username@servername]$ mahout clusterdump -i video_tags_kmean_job/clusters/clusters-10-final -o ~/video_tags_clusters_dump -p video_tags_kmean_job/clusters/clusteredPoints -dt sequencefile -d video_tags_kmean_job/vectors/dictionary.file-0 -n 50
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using /opt/hadoop/default/bin/hadoop and HADOOP_CONF_DIR=/opt/hadoop/default/etc/hadoop
MAHOUT-JOB: /home/username/apps/mahout-distribution-0.9/mahout-examples-0.9-job.jar
14/11/04 15:55:30 ERROR common.AbstractJob: Unexpected sequencefile while processing Job-Specific Options:
Unexpected sequencefile while processing Job-Specific Options:
Usage:
[--input <input> --output <output> --outputFormat <outputFormat> --substring
<substring> --numWords <numWords> --pointsDir <pointsDir> --samplePoints
<samplePoints> --dictionary <dictionary> --dictionaryType <dictionaryType>
--evaluate --distanceMeasure <distanceMeasure> --help --tempDir <tempDir>
--startPhase <startPhase> --endPhase <endPhase>]
Job-Specific Options:
--input (-i) input Path to job input directory.
--output (-o) output The directory pathname for output.
--outputFormat (-of) outputFormat The optional output format for the
results. Options: TEXT, CSV, JSON
or GRAPH_ML
--substring (-b) substring The number of chars of the
asFormatString() to print
--numWords (-n) numWords The number of top terms to print
--pointsDir (-p) pointsDir The directory containing points
sequence files mapping input
vectors to their cluster. If
specified, then the program will
output the points associated with
a cluster
--samplePoints (-sp) samplePoints Specifies the maximum number of
points to include _per_ cluster.
The default is to include all
points
--dictionary (-d) dictionary The dictionary file
--dictionaryType (-dt) dictionaryType The dictionary file type
(text|sequencefile)
--evaluate (-e) Run ClusterEvaluator and
CDbwEvaluator over the input. The
output will be appended to the
rest of the output at the end.
--distanceMeasure (-dm) distanceMeasure The classname of the
DistanceMeasure. Default is
SquaredEuclidean
--help (-h) Print out help
--tempDir tempDir Intermediate output directory
--startPhase startPhase First phase to run
--endPhase endPhase Last phase to run
14/11/04 15:55:31 INFO driver.MahoutDriver: Program took 439 ms (Minutes: 0.007316666666666667)
PS: I configed mahout in /home/username/.bashrc:
# set mahout path
export MAHOUT_HOME=/home/username/apps/mahout-distribution-0.9
export MAHOUT_LOCAL=
export PATH=$PATH:$MAHOUT_HOME/bin
export CLASSPATH=$CLASSPATH:$MAHOUT_HOME/mahout-core-0.9.jar:$MAHOUT_HOME/mahout-math-0.9.jar:$MAHOUT_HOME/mahout-integration-0.9.jar
Thanks a lot
Spike