You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Andrew Palumbo <ap...@outlook.com> on 2015/04/08 19:21:09 UTC
mahout 0.10.0 example scripts
I ran through all the large example scripts using all options:
cluster-reuters.sh, classify-20newsgroups.sh, classify-wikipedia.sh and
cluster-synthetic.sh, /examples/bin/run-rf.sh 1000 in both
MAHOUT_LOCAL=true and MAHOUT_LOCAL unset (cluster) modes.
Also ran factorize-movielens-1M.sh (uses MAHOUT_LOCAL=true only) and
spark-document-classifier.mscala (mahout-shell script)
Setup:
Hadoop 2.4.1 pseudo-cluster using default config from hadoop
configuration page.
Spark-1.1.1-bin-hadoop2.4 binarys downloaded pre-compiled.
$MASTER env variable set pointng to spark master URL.
Current status is:
MAHOUT_LOCAL unset:
cluster-reuters -> (2) needs more yarn heap memory (noted in script)
classify-wikipedia -> (1) needs more yarn heap memory (noted in script)
MAHOUT_LOCAL=true:
cluster-reuters -> (1) fails due to local vs cluster script issues -
added to script: (runs from this example script in cluster mode only)
classify-20Newsgroups -> (3),(4) exit gracefully with a message that
'MAHOUT_LOCAL=true' can not be set. Similarly if $MASTER is not set.
If anyone has any problems running cluster-reuters.sh (this happens
sometimes eg. if the download doesn't complete), you should just need to
delete your /tmp/mahout-work-$user directory and run again.