You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Tharindu Mathew <mc...@gmail.com> on 2012/02/16 18:31:48 UTC
Running cluster dumper from trunk build
Hi,
I'm trying out the synthetic control example and noticed the cluster dumper
command located at [1] does not work.
Appreciate if anyone can correct my command... seems --seqFileDir is
deprecated... I tried a few more combinations of commands and it failed to
work.
[1] - https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper
Files in HDFS at output:
$ bin/hadoop fs -lsr output
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:32
/user/mackie/output/clusteredPoints
-rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:32
/user/mackie/output/clusteredPoints/_SUCCESS
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history
-rw-r--r-- 1 mackie supergroup 7105 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_1329408095893_mackie_Canopy+Driver+running+clusterData+over+input%3A+outp
-rw-r--r-- 1 mackie supergroup 20634 2012-02-16 21:31
/user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_conf.xml
-rw-r--r-- 1 mackie supergroup 340891 2012-02-16 21:31
/user/mackie/output/clusteredPoints/part-m-00000
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusters-0-final
-rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:31
/user/mackie/output/clusters-0-final/_SUCCESS
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history
-rw-r--r-- 1 mackie supergroup 10696 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_1329408047297_mackie_Canopy+Driver+running+buildClusters+over+input%3A+ou
-rw-r--r-- 1 mackie supergroup 20920 2012-02-16 21:30
/user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_conf.xml
-rw-r--r-- 1 mackie supergroup 6747 2012-02-16 21:31
/user/mackie/output/clusters-0-final/part-r-00000
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data
-rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data/_SUCCESS
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data/_logs
drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
/user/mackie/output/data/_logs/history
-rw-r--r-- 1 mackie supergroup 7063 2012-02-16 21:30
/user/mackie/output/data/_logs/history/job_201202162112_0003_1329408010408_mackie_Input+Driver+running+over+input%3A+testdata
-rw-r--r-- 1 mackie supergroup 19845 2012-02-16 21:30
/user/mackie/output/data/_logs/history/job_201202162112_0003_conf.xml
-rw-r--r-- 1 mackie supergroup 335470 2012-02-16 21:30
/user/mackie/output/data/part-m-00000
Here's my output:
$ $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-10
--pointsDir output/clusteredPoints --output
$MAHOUT_HOME/examples/output/clusteranalyze.txt
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using
HADOOP_HOME=/Users/mackie/devtools/hadoop-0.20.204.0
No HADOOP_CONF_DIR set, using /Users/mackie/devtools/hadoop-0.20.204.0/conf
MAHOUT-JOB:
/Users/mackie/source-checkouts/mahout-trunk/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar
12/02/16 22:50:31 ERROR common.AbstractJob: Unexpected --seqFileDir while
processing Job-Specific Options:
usage: <command> [Generic Options] [Job-Specific Options]
Generic Options:
-archives <paths> comma separated archives to be unarchived
on the compute machines.
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-files <paths> comma separated files to be copied to the
map reduce cluster
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-libjars <paths> comma separated jar files to include in
the classpath.
-tokenCacheFile <tokensFile> name of the file with the tokens
Unexpected --seqFileDir while processing Job-Specific
Options:
Usage:
[--input <input> --output <output> --outputFormat <outputFormat>
--substring
<substring> --numWords <numWords> --pointsDir <pointsDir>
--samplePoints
<samplePoints> --dictionary <dictionary> --dictionaryType
<dictionaryType>
--evaluate --distanceMeasure <distanceMeasure> --help --tempDir
<tempDir>
--startPhase <startPhase> --endPhase
<endPhase>]
Job-Specific
Options:
--input (-i) input Path to job input
directory.
--output (-o) output The directory pathname for
output.
--outputFormat (-of) outputFormat The optional output format
to
write the results as.
Options:
TEXT, CSV or
GRAPH_ML
--substring (-b) substring The number of chars of
the
asFormatString() to
print
--numWords (-n) numWords The number of top terms to
print
--pointsDir (-p) pointsDir The directory containing
points
sequence files mapping
input
vectors to their cluster.
If
specified, then the program
will
output the points associated
with
a
cluster
--samplePoints (-sp) samplePoints Specifies the maximum number
of
points to include _per_
cluster.
The default is to include
all
points
--dictionary (-d) dictionary The dictionary
file
--dictionaryType (-dt) dictionaryType The dictionary file
type
(text|sequencefile)
--evaluate (-e) Run ClusterEvaluator
and
CDbwEvaluator over the input.
The
output will be appended to
the
rest of the output at the
end.
--distanceMeasure (-dm) distanceMeasure The classname of
the
DistanceMeasure. Default
is
SquaredEuclidean
--help (-h) Print out
help
--tempDir tempDir Intermediate output
directory
--startPhase startPhase First phase to
run
--endPhase endPhase Last phase to
run
12/02/16 22:50:31 INFO driver.MahoutDriver: Program took 308 ms (Minutes:
0.0051333333333333335)
--
Regards,
Tharindu
blog: http://mackiemathew.com/
Re: Running cluster dumper from trunk build
Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Looks like it was just changed to -i (--input), likely for uniformity
with other CLI operations. The documentation needs to be updated.
On 2/16/12 10:31 AM, Tharindu Mathew wrote:
> Hi,
>
> I'm trying out the synthetic control example and noticed the cluster dumper
> command located at [1] does not work.
>
> Appreciate if anyone can correct my command... seems --seqFileDir is
> deprecated... I tried a few more combinations of commands and it failed to
> work.
>
> [1] - https://cwiki.apache.org/confluence/display/MAHOUT/Cluster+Dumper
>
> Files in HDFS at output:
>
> $ bin/hadoop fs -lsr output
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:32
> /user/mackie/output/clusteredPoints
> -rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:32
> /user/mackie/output/clusteredPoints/_SUCCESS
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
> /user/mackie/output/clusteredPoints/_logs
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
> /user/mackie/output/clusteredPoints/_logs/history
> -rw-r--r-- 1 mackie supergroup 7105 2012-02-16 21:31
> /user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_1329408095893_mackie_Canopy+Driver+running+clusterData+over+input%3A+outp
> -rw-r--r-- 1 mackie supergroup 20634 2012-02-16 21:31
> /user/mackie/output/clusteredPoints/_logs/history/job_201202162112_0005_conf.xml
> -rw-r--r-- 1 mackie supergroup 340891 2012-02-16 21:31
> /user/mackie/output/clusteredPoints/part-m-00000
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:31
> /user/mackie/output/clusters-0-final
> -rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:31
> /user/mackie/output/clusters-0-final/_SUCCESS
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
> /user/mackie/output/clusters-0-final/_logs
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
> /user/mackie/output/clusters-0-final/_logs/history
> -rw-r--r-- 1 mackie supergroup 10696 2012-02-16 21:30
> /user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_1329408047297_mackie_Canopy+Driver+running+buildClusters+over+input%3A+ou
> -rw-r--r-- 1 mackie supergroup 20920 2012-02-16 21:30
> /user/mackie/output/clusters-0-final/_logs/history/job_201202162112_0004_conf.xml
> -rw-r--r-- 1 mackie supergroup 6747 2012-02-16 21:31
> /user/mackie/output/clusters-0-final/part-r-00000
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
> /user/mackie/output/data
> -rw-r--r-- 1 mackie supergroup 0 2012-02-16 21:30
> /user/mackie/output/data/_SUCCESS
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
> /user/mackie/output/data/_logs
> drwxr-xr-x - mackie supergroup 0 2012-02-16 21:30
> /user/mackie/output/data/_logs/history
> -rw-r--r-- 1 mackie supergroup 7063 2012-02-16 21:30
> /user/mackie/output/data/_logs/history/job_201202162112_0003_1329408010408_mackie_Input+Driver+running+over+input%3A+testdata
> -rw-r--r-- 1 mackie supergroup 19845 2012-02-16 21:30
> /user/mackie/output/data/_logs/history/job_201202162112_0003_conf.xml
> -rw-r--r-- 1 mackie supergroup 335470 2012-02-16 21:30
> /user/mackie/output/data/part-m-00000
>
> Here's my output:
>
> $ $MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-10
> --pointsDir output/clusteredPoints --output
> $MAHOUT_HOME/examples/output/clusteranalyze.txt
> MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
> Running on hadoop, using
> HADOOP_HOME=/Users/mackie/devtools/hadoop-0.20.204.0
> No HADOOP_CONF_DIR set, using /Users/mackie/devtools/hadoop-0.20.204.0/conf
> MAHOUT-JOB:
> /Users/mackie/source-checkouts/mahout-trunk/examples/target/mahout-examples-0.7-SNAPSHOT-job.jar
> 12/02/16 22:50:31 ERROR common.AbstractJob: Unexpected --seqFileDir while
> processing Job-Specific Options:
> usage:<command> [Generic Options] [Job-Specific Options]
> Generic Options:
> -archives<paths> comma separated archives to be unarchived
> on the compute machines.
> -conf<configuration file> specify an application configuration file
> -D<property=value> use value for given property
> -files<paths> comma separated files to be copied to the
> map reduce cluster
> -fs<local|namenode:port> specify a namenode
> -jt<local|jobtracker:port> specify a job tracker
> -libjars<paths> comma separated jar files to include in
> the classpath.
> -tokenCacheFile<tokensFile> name of the file with the tokens
> Unexpected --seqFileDir while processing Job-Specific
> Options:
> Usage:
>
> [--input<input> --output<output> --outputFormat<outputFormat>
> --substring
> <substring> --numWords<numWords> --pointsDir<pointsDir>
> --samplePoints
> <samplePoints> --dictionary<dictionary> --dictionaryType
> <dictionaryType>
> --evaluate --distanceMeasure<distanceMeasure> --help --tempDir
> <tempDir>
> --startPhase<startPhase> --endPhase
> <endPhase>]
> Job-Specific
> Options:
> --input (-i) input Path to job input
> directory.
> --output (-o) output The directory pathname for
> output.
> --outputFormat (-of) outputFormat The optional output format
> to
> write the results as.
> Options:
> TEXT, CSV or
> GRAPH_ML
> --substring (-b) substring The number of chars of
> the
> asFormatString() to
> print
> --numWords (-n) numWords The number of top terms to
> print
> --pointsDir (-p) pointsDir The directory containing
> points
> sequence files mapping
> input
> vectors to their cluster.
> If
> specified, then the program
> will
> output the points associated
> with
> a
> cluster
> --samplePoints (-sp) samplePoints Specifies the maximum number
> of
> points to include _per_
> cluster.
> The default is to include
> all
>
> points
> --dictionary (-d) dictionary The dictionary
> file
> --dictionaryType (-dt) dictionaryType The dictionary file
> type
>
> (text|sequencefile)
> --evaluate (-e) Run ClusterEvaluator
> and
> CDbwEvaluator over the input.
> The
> output will be appended to
> the
> rest of the output at the
> end.
> --distanceMeasure (-dm) distanceMeasure The classname of
> the
> DistanceMeasure. Default
> is
>
> SquaredEuclidean
> --help (-h) Print out
> help
> --tempDir tempDir Intermediate output
> directory
> --startPhase startPhase First phase to
> run
> --endPhase endPhase Last phase to
> run
> 12/02/16 22:50:31 INFO driver.MahoutDriver: Program took 308 ms (Minutes:
> 0.0051333333333333335)
>