You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Eli Finkelshteyn <ie...@gmail.com> on 2012/04/18 00:01:35 UTC

How to Test that Mahout Job is Running on Hadoop

Hi Folks,
This sounds like a bit of a stupid question, but here goes anyway: I'm 
currently running ItemSimilarityJob 
<https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html>. 
I've specified the input and output directories as explicitly on hdfs (I 
used hdfs:// uris), so I know it's grabbing at least the input off of 
HDFS. Nevertheless, the job is taking up a ton of memory on the client 
machine I'm running things from (the full 5 gigs I gave it, along with a 
ton of CPU), and when I go to port 50030 on my namenode, I'm not seeing 
any running jobs. So, am I doing something wrong and things are running 
locally, or are things actually running on Hadoop? Whatever the answer, 
how do I check?

The command I ran to get things going was along the lines of:

MAVEN_OPTS="-Xms5g -Xmx5g -server" mvn exec:java 
-Dexec.mainClass=org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob 
-Dexec.args="-Dmapred.input.dir=hdfs://mynamename/tmp/input 
-Dmapred.output.dir=hdfs://mynamenode/tmp/output --similarityClassname 
SIMILARITY_LOGLIKELIHOOD --booleanData --maxSimilaritiesPerItem 20 
--maxPrefsPerUser 100 --minPrefsPerUser 3 --tempDir /tmp/hadoop-crap"

Thanks,
Eli

Re: How to Test that Mahout Job is Running on Hadoop

Posted by Sean Owen <sr...@gmail.com>.

The namenode is a component of HDFS, not Hadoop. Go to the Hadoop job
tracker to see jobs!

On Tue, Apr 17, 2012 at 5:01 PM, Eli Finkelshteyn <ie...@gmail.com> wrote:
> Hi Folks,
> This sounds like a bit of a stupid question, but here goes anyway: I'm
> currently running ItemSimilarityJob
> <https://builds.apache.org/job/Mahout-Quality/javadoc/org/apache/mahout/cf/taste/hadoop/similarity/item/ItemSimilarityJob.html>.
> I've specified the input and output directories as explicitly on hdfs (I
> used hdfs:// uris), so I know it's grabbing at least the input off of HDFS.
> Nevertheless, the job is taking up a ton of memory on the client machine I'm
> running things from (the full 5 gigs I gave it, along with a ton of CPU),
> and when I go to port 50030 on my namenode, I'm not seeing any running jobs.
> So, am I doing something wrong and things are running locally, or are things
> actually running on Hadoop? Whatever the answer, how do I check?
>
> The command I ran to get things going was along the lines of:
>
> MAVEN_OPTS="-Xms5g -Xmx5g -server" mvn exec:java
> -Dexec.mainClass=org.apache.mahout.cf.taste.hadoop.similarity.item.ItemSimilarityJob
> -Dexec.args="-Dmapred.input.dir=hdfs://mynamename/tmp/input
> -Dmapred.output.dir=hdfs://mynamenode/tmp/output --similarityClassname
> SIMILARITY_LOGLIKELIHOOD --booleanData --maxSimilaritiesPerItem 20
> --maxPrefsPerUser 100 --minPrefsPerUser 3 --tempDir /tmp/hadoop-crap"
>
> Thanks,
> Eli