You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Andrew Palumbo <ap...@outlook.com> on 2015/04/08 19:21:09 UTC

mahout 0.10.0 example scripts

I ran through all the large example scripts using all options: 
cluster-reuters.sh, classify-20newsgroups.sh, classify-wikipedia.sh and 
cluster-synthetic.sh,   /examples/bin/run-rf.sh 1000 in  both 
MAHOUT_LOCAL=true and MAHOUT_LOCAL unset (cluster) modes.

Also ran factorize-movielens-1M.sh (uses MAHOUT_LOCAL=true only) and 
spark-document-classifier.mscala (mahout-shell script)

Setup:
Hadoop 2.4.1 pseudo-cluster using default config from hadoop 
configuration page.
Spark-1.1.1-bin-hadoop2.4 binarys downloaded pre-compiled.
$MASTER env variable set pointng to spark master URL.


Current status is:

MAHOUT_LOCAL unset:
   cluster-reuters -> (2) needs more yarn heap memory (noted in script)
   classify-wikipedia -> (1) needs more yarn heap memory (noted in script)

MAHOUT_LOCAL=true:
   cluster-reuters -> (1) fails due to local vs cluster script issues - 
added to script: (runs from this example script in cluster mode only)
   classify-20Newsgroups -> (3),(4) exit gracefully with a message that 
'MAHOUT_LOCAL=true' can not be set. Similarly if $MASTER is not set.

If anyone has any problems running cluster-reuters.sh (this happens 
sometimes eg. if the download doesn't complete), you should just need to 
delete your /tmp/mahout-work-$user directory and run again.