You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by james q <ja...@gmail.com> on 2011/01/07 17:22:18 UTC

Mahout Error with Amazon's Elastic Map Reduce

Hello everyone!

I'm having a little bit of trouble running Mahout algorithms on
Elastic Map reduce. I read the wiki page and followed the instructions
there. I created a 4 node EMR cluster, 1 master 3 slaves using the
command:

elastic-mapreduce --create --alive --name name --log-uri
s3n://$S3BUCKET/logs/ --num-instances=2  --enable-debugging
--hadoop-version=0.20

I then create a step to do Kmeans clustering over some data:

elastic-mapreduce -j $JOBID --jar
s3n://$S3BUCKET/mahout/mahout-core-0.4-SNAPSHOT-job.jar  \
--main-class org.apache.mahout.clustering.kmeans.KMeansDriver \
--arg --input    --arg s3n://$S3BUCKET/input/vectors/ \
--arg --clusters --arg s3n://$S3BUCKET/input/clusters/ \
--arg --output   --arg s3n://$S3BUCKET/output/kmeans/ \
--arg -k --arg 1000 \
--arg --distanceMeasure --arg
org.apache.mahout.common.distance.CosineDistanceMeasure \
--arg --convergenceDelta --arg 0.1 \
--arg --overwrite \
--arg --maxIter --arg 5 \
--arg --clustering \
--step-name "MAHOUT Run"

So, The main issue I'm having is that using the mahout-core Job file
leads to a missing class issue ("Exception in thread "main"
java.lang.NoClassDefFoundError:
org/apache/mahout/math/CardinalityException"). However, Using the
mahout-examples Job file leads to it at least running on the EMR
cluster ... but it seems that only one slave in the cluster does any
processing.

Has anyone seen something like this before? Is it possible the core
job file is not being created correctly? Any thoughts would be
appreciated!

Re: Mahout Error with Amazon's Elastic Map Reduce

Posted by Grant Ingersoll <gs...@apache.org>.
On Jan 7, 2011, at 11:22 AM, james q wrote:

> Hello everyone!
> 
> I'm having a little bit of trouble running Mahout algorithms on
> Elastic Map reduce. I read the wiki page and followed the instructions
> there. I created a 4 node EMR cluster, 1 master 3 slaves using the
> command:
> 
> elastic-mapreduce --create --alive --name name --log-uri
> s3n://$S3BUCKET/logs/ --num-instances=2  --enable-debugging
> --hadoop-version=0.20
> 
> I then create a step to do Kmeans clustering over some data:
> 
> elastic-mapreduce -j $JOBID --jar
> s3n://$S3BUCKET/mahout/mahout-core-0.4-SNAPSHOT-job.jar  \
> --main-class org.apache.mahout.clustering.kmeans.KMeansDriver \
> --arg --input    --arg s3n://$S3BUCKET/input/vectors/ \
> --arg --clusters --arg s3n://$S3BUCKET/input/clusters/ \
> --arg --output   --arg s3n://$S3BUCKET/output/kmeans/ \
> --arg -k --arg 1000 \
> --arg --distanceMeasure --arg
> org.apache.mahout.common.distance.CosineDistanceMeasure \
> --arg --convergenceDelta --arg 0.1 \
> --arg --overwrite \
> --arg --maxIter --arg 5 \
> --arg --clustering \
> --step-name "MAHOUT Run"
> 
> So, The main issue I'm having is that using the mahout-core Job file
> leads to a missing class issue ("Exception in thread "main"
> java.lang.NoClassDefFoundError:
> org/apache/mahout/math/CardinalityException").

Hmm, that may be a bug on our end.  I did not see that one.  Can you check the job file, is that class in the Job?  You may just need to repackage it.

> However, Using the
> mahout-examples Job file leads to it at least running on the EMR
> cluster ... but it seems that only one slave in the cluster does any
> processing.

How much data were you giving it?

> 
> Has anyone seen something like this before? Is it possible the core
> job file is not being created correctly? Any thoughts would be
> appreciated!

-Grant


--------------------------
Grant Ingersoll
http://www.lucidimagination.com