You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Fernando Santos <fe...@gmail.com> on 2013/11/12 23:56:50 UTC

Check if mahout is indeed running on Hadoop

Hello everyone,

I have a configured a hadoop 1.2.1 single node cluster and installed mahout
0.8.

The node seems to be working correctly.

I'm trying to run the 20newsgroups mahout example on the hadoop cluster
running the cnaivebayes classifier. The problem is that I'm getting the
following error:

13/11/12 18:31:46 INFO common.AbstractJob: Command line arguments:
{--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],
--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],
--input=[/tmp/mahout-work-hduser/20news-all], --keyPrefix=[],
--method=[mapreduce], --output=[/tmp/mahout-work-hduser/20news-seq],
--overwrite=null, --startPhase=[0], --tempDir=[temp]}
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/mahout-work-hduser/20news-all
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

When i check the permissions of the folder I get this:
hduser@fernandoPC:/usr/local/mahout/core/target$ ls -l
/tmp/mahout-work-hduser/
total 14136
drwxr-xr-x 22 hduser hadoop     4096 Nov 12 18:31 20news-all
drwxr-xr-x  4 hduser hadoop     4096 Nov 12 18:09 20news-bydate
-rw-r--r--  1 hduser hadoop 14464277 Nov 12 18:09 20news-bydate.tar.gz

When I run the 20newsgroups choosing sgd classifier, it works correctly. I
think it's because it does not use map/reduce tasks so it is not even
running on hadoop.

Maybe it is something related to user access. I can run it with root user,
but I'm not sure if it runs correctly then. While it runs, I can't see any
map/reduce jobs going on on the jobTracker (
http://localhost:50030/jobtracker.jsp) so I think it might be running but
not in hadoop cluster, but locally instead.  Does it make sense? I actually
don't know if it should be showing the tasks running in this jobtracker
page..

Anyways, I'm trying to solve this for days, checked google a lot and didn't
find any help. Does anyone have any ideia?

PS: I'm totally new to hadoop and mahout.

-- 
Fernando Santos
+55 61 8129 8505

Re: Check if mahout is indeed running on Hadoop

Posted by Fernando Santos <fe...@gmail.com>.
Hello Suneel,

Thank you for the tip. It was indeed the bug, and adding "-xm sequential"
solved this problem. But then I got a similar error while testing the
classifier (./bin/mahout testnb). Seems to be again an error abut
permissions. Maybe another bug? =P

13/11/13 17:39:21 WARN driver.MahoutDriver: No testnb.props found on
classpath, will use command-line arguments only
13/11/13 17:39:21 INFO common.AbstractJob: Command line arguments:
{--endPhase=[2147483647],
--input=[/tmp/mahout-work-hduser/20news-train-vectors],
--labelIndex=[/tmp/mahout-work-hduser/labelindex],
--model=[/tmp/mahout-work-hduser/model],
--output=[/tmp/mahout-work-hduser/20news-testing], --overwrite=null,
--startPhase=[0], --tempDir=[temp], --testComplementary=null}
13/11/13 17:39:22 INFO mapred.JobClient: Cleaning up the staging area
hdfs://localhost:54310/app/hadoop/tmp/mapred/staging/hduser/.staging/job_201311131709_0036
13/11/13 17:39:22 ERROR security.UserGroupInformation:
PriviledgedActionException as:hduser cause:java.io.FileNotFoundException:
File does not exist: /tmp/mahout-work-hduser/model
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/mahout-work-hduser/model
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
 at
org.apache.hadoop.filecache.DistributedCache.getFileStatus(DistributedCache.java:185)
at
org.apache.hadoop.filecache.TrackerDistributedCacheManager.getFileStatus(TrackerDistributedCacheManager.java:723)
 at
org.apache.hadoop.filecache.TrackerDistributedCacheManager.determineTimestamps(TrackerDistributedCacheManager.java:792)
 at
org.apache.hadoop.filecache.TrackerDistributedCacheManager.determineTimestampsAndCacheVisibilities(TrackerDistributedCacheManager.java:755)
 at
org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:843)
at
org.apache.hadoop.mapred.JobClient.copyAndConfigureFiles(JobClient.java:734)
 at org.apache.hadoop.mapred.JobClient.access$400(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:951)
 at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936)
at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
 at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:550)
 at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.runMapReduce(TestNaiveBayesDriver.java:141)
 at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.run(TestNaiveBayesDriver.java:109)
 at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.mahout.classifier.naivebayes.test.TestNaiveBayesDriver.main(TestNaiveBayesDriver.java:66)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
 at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
 at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)


I checked the working directory, and indeed the model folder wasn't
created. So I think the problem is that it is not generating this /model
folder.

hduser@fernandoPC:/usr/local/mahout$ ls /tmp/mahout-work-hduser/
20news-all  20news-bydate  20news-bydate.tar.gz

hduser@fernandoPC:/usr/local/mahout$ ls -l /tmp
drwxr-xr-x  4 hduser   hadoop      4096 Nov 13 17:35 mahout-work-hduser


Also, while training the classifier, three jobs failed due to these
exceptions. Don't know if they were relevant for the error or not:

13/11/13 17:38:58 INFO mapred.JobClient: Task Id :
attempt_201311131709_0035_m_000000_0, Status : FAILED
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
 at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/13 17:39:03 INFO mapred.JobClient: Task Id :
attempt_201311131709_0035_m_000000_1, Status : FAILED
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
 at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/13 17:39:09 INFO mapred.JobClient: Task Id :
attempt_201311131709_0035_m_000000_2, Status : FAILED
java.lang.IllegalArgumentException
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:76)
 at
org.apache.mahout.classifier.naivebayes.training.WeightsMapper.setup(WeightsMapper.java:44)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
 at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

13/11/13 17:39:16 INFO mapred.JobClient: Job complete: job_201311131709_0035



Any ideias?

Thanks!


2013/11/12 Suneel Marthi <su...@yahoo.com>

> Hi Fernando,
>
> This could be related to a Bug (see MAHOUT-1319) in seqdirectory wherein
> 'seqdirectory' ignores the 'PrefixFilter' argument.
> While this should be fixed in Mahout 0.9, could u try modifying the
> following in classify-20newsgroups.sh
>
>      echo "Creating sequence files from 20newsgroups data"
>   ./bin/mahout seqdirectory \
>     -i ${WORK_DIR}/20news-all \
>     -o ${WORK_DIR}/20news-seq -ow
>
> to read as
>
>    echo "Creating sequence files from 20newsgroups data"
>   ./bin/mahout seqdirectory \
>     -i ${WORK_DIR}/20news-all \
>     -o ${WORK_DIR}/20news-seq -ow -xm sequential
>
>
> Please give that a try.
>
>
>
>
>
> On Tuesday, November 12, 2013 5:57 PM, Fernando Santos <
> fernandoleandro1991@gmail.com> wrote:
>
> Hello everyone,
>
> I have a configured a hadoop 1.2.1 single node cluster and installed mahout
> 0.8.
>
> The node seems to be working correctly.
>
> I'm trying to run the 20newsgroups mahout example on the hadoop cluster
> running the cnaivebayes classifier. The problem is that I'm getting the
> following error:
>
> 13/11/12 18:31:46 INFO common.AbstractJob: Command line arguments:
> {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],
> --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],
> --input=[/tmp/mahout-work-hduser/20news-all], --keyPrefix=[],
> --method=[mapreduce], --output=[/tmp/mahout-work-hduser/20news-seq],
> --overwrite=null, --startPhase=[0], --tempDir=[temp]}
> Exception in thread "main" java.io.FileNotFoundException: File does not
> exist: /tmp/mahout-work-hduser/20news-all
>     at
>
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
>     at
>
> org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
>     at
>
> org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>     at
>
> org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at
>
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>     at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
>
> When i check the permissions of the folder I get this:
> hduser@fernandoPC:/usr/local/mahout/core/target$ ls -l
> /tmp/mahout-work-hduser/
> total 14136
> drwxr-xr-x 22 hduser hadoop     4096 Nov 12 18:31 20news-all
> drwxr-xr-x  4 hduser hadoop     4096 Nov 12 18:09 20news-bydate
> -rw-r--r--  1 hduser hadoop 14464277 Nov 12 18:09 20news-bydate.tar.gz
>
> When I run the 20newsgroups choosing sgd classifier, it works correctly. I
> think it's because it does not use map/reduce tasks so it is not even
> running on hadoop.
>
> Maybe it is something related to user access. I can run it with root user,
> but I'm not sure if it runs correctly then. While it runs, I can't see any
> map/reduce jobs going on on the jobTracker (
> http://localhost:50030/jobtracker.jsp) so I think it might be running but
> not in hadoop cluster, but locally instead.  Does it make sense? I actually
> don't know if it should be showing the tasks running in this jobtracker
> page..
>
> Anyways, I'm trying to solve this for days, checked google a lot and didn't
> find any help. Does anyone have any ideia?
>
> PS: I'm totally new to hadoop and mahout.
>
> --
> Fernando Santos
> +55 61 8129 8505
>



-- 
Fernando Santos
+55 61 8129 8505

Re: Check if mahout is indeed running on Hadoop

Posted by Suneel Marthi <su...@yahoo.com>.
Hi Fernando,

This could be related to a Bug (see MAHOUT-1319) in seqdirectory wherein 'seqdirectory' ignores the 'PrefixFilter' argument.
While this should be fixed in Mahout 0.9, could u try modifying the following in classify-20newsgroups.sh

     echo "Creating sequence files from 20newsgroups data"
  ./bin/mahout seqdirectory \
    -i ${WORK_DIR}/20news-all \
    -o ${WORK_DIR}/20news-seq -ow

to read as

   echo "Creating sequence files from 20newsgroups data"
  ./bin/mahout seqdirectory \
    -i ${WORK_DIR}/20news-all \
    -o ${WORK_DIR}/20news-seq -ow -xm sequential


Please give that a try.





On Tuesday, November 12, 2013 5:57 PM, Fernando Santos <fe...@gmail.com> wrote:
 
Hello everyone,

I have a configured a hadoop 1.2.1 single node cluster and installed mahout
0.8.

The node seems to be working correctly.

I'm trying to run the 20newsgroups mahout example on the hadoop cluster
running the cnaivebayes classifier. The problem is that I'm getting the
following error:

13/11/12 18:31:46 INFO common.AbstractJob: Command line arguments:
{--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647],
--fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter],
--input=[/tmp/mahout-work-hduser/20news-all], --keyPrefix=[],
--method=[mapreduce], --output=[/tmp/mahout-work-hduser/20news-seq],
--overwrite=null, --startPhase=[0], --tempDir=[temp]}
Exception in thread "main" java.io.FileNotFoundException: File does not
exist: /tmp/mahout-work-hduser/20news-all
    at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:558)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.runMapReduce(SequenceFilesFromDirectory.java:140)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.run(SequenceFilesFromDirectory.java:89)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at
org.apache.mahout.text.SequenceFilesFromDirectory.main(SequenceFilesFromDirectory.java:63)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
    at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
    at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:194)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:160)

When i check the permissions of the folder I get this:
hduser@fernandoPC:/usr/local/mahout/core/target$ ls -l
/tmp/mahout-work-hduser/
total 14136
drwxr-xr-x 22 hduser hadoop     4096 Nov 12 18:31 20news-all
drwxr-xr-x  4 hduser hadoop     4096 Nov 12 18:09 20news-bydate
-rw-r--r--  1 hduser hadoop 14464277 Nov 12 18:09 20news-bydate.tar.gz

When I run the 20newsgroups choosing sgd classifier, it works correctly. I
think it's because it does not use map/reduce tasks so it is not even
running on hadoop.

Maybe it is something related to user access. I can run it with root user,
but I'm not sure if it runs correctly then. While it runs, I can't see any
map/reduce jobs going on on the jobTracker (
http://localhost:50030/jobtracker.jsp) so I think it might be running but
not in hadoop cluster, but locally instead.  Does it make sense? I actually
don't know if it should be showing the tasks running in this jobtracker
page..

Anyways, I'm trying to solve this for days, checked google a lot and didn't
find any help. Does anyone have any ideia?

PS: I'm totally new to hadoop and mahout.

-- 
Fernando Santos
+55 61 8129 8505