You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Martin Provencher <mp...@gmail.com> on 2011/03/29 20:35:52 UTC

Wikipedia example FileNotFoundException for categories

Hi all,
    I'm trying to run the wikipedia example on Amazon Elastic-MapReduce. I'm
using
https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Exampleand
https://cwiki.apache.org/MAHOUT/mahout-on-elastic-mapreduce.html as sources.

My problem is when I run :
./elastic-mapreduce --jar
s3://yatter.tagger/tagger/mahout-examples-0.5-SNAPSHOT-job.jar \
    --main-class org.apache.mahout.driver.MahoutDriver \
    --arg wikipediaDataSetCreator  \
    --arg -i --arg s3n://yatter.tagger/wikipedia/chunks \
    --arg -o --arg /yatter.tagger/wikipedia/input \
    --arg -c --arg /yatter.tagger/tagger/categories.txt \
    -j JOB_ID \

I always got :

Exception in thread "main" java.io.FileNotFoundException:
categories.txt (No such file or directory)
	at java.io.FileInputStream.open(Native Method)
	at java.io.FileInputStream.<init>(FileInputStream.java:106)
	at org.apache.mahout.common.FileLineIterator.getFileInputStream(FileLineIterator.java:100)
	at org.apache.mahout.common.FileLineIterable.<init>(FileLineIterable.java:53)
	at org.apache.mahout.common.FileLineIterable.<init>(FileLineIterable.java:43)
	at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:166)
	at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:117)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


I've tried to use s3://yatter.tagger/tagger/categories.txt,
s3n://yatter.tagger/tagger/categories.txt,
http://yatter.tagger.s3.amazonaws.com/tagger/categories.txt .
I've tried to copy the file directly on hadoop dfs with hadoop dfs
-copyFromLocal categories.txt /yatter.tagger/tagger/categories.txt after
downloading the file, but it didn't work.

What do I miss here?

Martin

Re: Wikipedia example FileNotFoundException for categories

Posted by Martin Provencher <mp...@gmail.com>.

I've created a JIRA issue as you said :
https://issues.apache.org/jira/browse/MAHOUT-646

The content of the file was :
===================================================================
---
examples/src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorDriver.java
(revision 1087334)
+++
examples/src/main/java/org/apache/mahout/classifier/bayes/WikipediaDatasetCreatorDriver.java
(working copy)
@@ -185,7 +185,7 @@
     //TODO: job.setNumMapTasks(100);
     job.setInputFormatClass(XmlInputFormat.class);
     job.setReducerClass(WikipediaDatasetCreatorReducer.class);
-    job.setOutputFormatClass(WikipediaDatasetCreatorOutputFormat.class);
+    //job.setOutputFormatClass(WikipediaDatasetCreatorOutputFormat.class);

     FileInputFormat.setInputPaths(job, new Path(input));
     Path outPath = new Path(output);

Regards,

Martin

On Thu, Mar 31, 2011 at 1:57 PM, Ted Dunning <te...@gmail.com> wrote:

> Pastebin and dropbox may also be helpful for you to share files.
>
>
> On Thu, Mar 31, 2011 at 10:49 AM, Sean Owen <sr...@gmail.com> wrote:
>
>> Files don't come through to the user@ list. Can you describe the change,
>> or
>> even file a JIRA issue?
>>
>> On Thu, Mar 31, 2011 at 4:19 PM, Martin Provencher
>> <mp...@gmail.com>wrote:
>>
>> > The problem was the MultipleTextOutputFormat in the Wikipedia example. I
>> > commented the line specifying it and it works. I've attached a diff file
>> so
>> > if anyone else got the same problem, it would be able to run the example
>> on
>> > EMR. You'll need high memory instances to be able to run it though.
>> >
>> > Martin
>> >
>> >
>>
>
>

Re: Wikipedia example FileNotFoundException for categories

Posted by Ted Dunning <te...@gmail.com>.

Pastebin and dropbox may also be helpful for you to share files.

On Thu, Mar 31, 2011 at 10:49 AM, Sean Owen <sr...@gmail.com> wrote:

> Files don't come through to the user@ list. Can you describe the change,
> or
> even file a JIRA issue?
>
> On Thu, Mar 31, 2011 at 4:19 PM, Martin Provencher
> <mp...@gmail.com>wrote:
>
> > The problem was the MultipleTextOutputFormat in the Wikipedia example. I
> > commented the line specifying it and it works. I've attached a diff file
> so
> > if anyone else got the same problem, it would be able to run the example
> on
> > EMR. You'll need high memory instances to be able to run it though.
> >
> > Martin
> >
> >
>

Re: Wikipedia example FileNotFoundException for categories

Posted by Sean Owen <sr...@gmail.com>.

Files don't come through to the user@ list. Can you describe the change, or
even file a JIRA issue?

On Thu, Mar 31, 2011 at 4:19 PM, Martin Provencher
<mp...@gmail.com>wrote:

> The problem was the MultipleTextOutputFormat in the Wikipedia example. I
> commented the line specifying it and it works. I've attached a diff file so
> if anyone else got the same problem, it would be able to run the example on
> EMR. You'll need high memory instances to be able to run it though.
>
> Martin
>
>

Re: Wikipedia example FileNotFoundException for categories

Posted by Martin Provencher <mp...@gmail.com>.

I didn't know about this configuration properties. I'll try it next time I
try to run the algorithm. I've used m2.2xlarge instances instead of the
m1.xlarge instances and it worked. But those instances cost about the double
of the other ...

Martin

On Fri, Apr 1, 2011 at 12:36 PM, Mat Kelcey <ma...@gmail.com>wrote:

> If you want to use one of the High-CPU instance types with Elastic
> MapReduce but you're having memory issues you could also try
> allocating more memory to the various hadoop daemons
>
> This can be done with a bootstrap action when you start your cluster
> $ elastic-mapreduce --create --master-instance-type c1.xlarge
> --slave-instance-type c1.xlarge --instance-count 10 --bootstrap-action
>
> s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive
>
> There's no guarantee this fixes every problem but it's worked for a
> number of people.
>
> See https://forums.aws.amazon.com/ann.jspa?annID=834 for more info
>
> --
> Mat
> Dev on Elastic MapReduce
>
> On 31 March 2011 15:19, Martin Provencher <mp...@gmail.com> wrote:
> > The problem was the MultipleTextOutputFormat in the Wikipedia example. I
> > commented the line specifying it and it works. I've attached a diff file
> so
> > if anyone else got the same problem, it would be able to run the example
> on
> > EMR. You'll need high memory instances to be able to run it though.
>

Re: Wikipedia example FileNotFoundException for categories

Posted by Mat Kelcey <ma...@gmail.com>.

If you want to use one of the High-CPU instance types with Elastic
MapReduce but you're having memory issues you could also try
allocating more memory to the various hadoop daemons

This can be done with a bootstrap action when you start your cluster
$ elastic-mapreduce --create --master-instance-type c1.xlarge
--slave-instance-type c1.xlarge --instance-count 10 --bootstrap-action
s3://elasticmapreduce/bootstrap-actions/configurations/latest/memory-intensive

There's no guarantee this fixes every problem but it's worked for a
number of people.

See https://forums.aws.amazon.com/ann.jspa?annID=834 for more info

--
Mat
Dev on Elastic MapReduce

On 31 March 2011 15:19, Martin Provencher <mp...@gmail.com> wrote:
> The problem was the MultipleTextOutputFormat in the Wikipedia example. I
> commented the line specifying it and it works. I've attached a diff file so
> if anyone else got the same problem, it would be able to run the example on
> EMR. You'll need high memory instances to be able to run it though.

Re: Wikipedia example FileNotFoundException for categories

Posted by Martin Provencher <mp...@gmail.com>.

The problem was the MultipleTextOutputFormat in the Wikipedia example. I
commented the line specifying it and it works. I've attached a diff file so
if anyone else got the same problem, it would be able to run the example on
EMR. You'll need high memory instances to be able to run it though.

Martin

On Wed, Mar 30, 2011 at 12:39 AM, Martin Provencher <mprovencher86@gmail.com
> wrote:

> I found my first problem. I need to chmod the file to let hadoop be able to
> read it. What I found very strange is that the file need to be on the master
> host filesystem and not in the HDFS. I though the HDFS was the default one
> ...
>
> My new problem seams to be related to the first one. When I run my command,
> I go to 100% map, 44% reduce then I get stuck there. I got this error :
>
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
> create file
> /yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000
> for DFSClient_attempt_201103292134_0010_r_000000_0 on client 10.240.10.157
> because current leaseholder is trying to recreate file.
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1045)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>     at $Proxy1.create(Unknown Source)
>
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy1.create(Unknown Source)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
>     at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
>     at
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>     at
> org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
>     at
> org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
>     at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
>     at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
>     at
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to
> create file
> /yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000
> on client 10.240.10.157 either because the filename is invalid or the file
> exists
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1092)
>     at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
>     at
> org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>     at $Proxy1.create(Unknown Source)
>
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy1.create(Unknown Source)
>     at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
>     at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
>     at
> org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
>     at
> org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>     at
> org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
>     at
> org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
>     at
> org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
>     at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
>     at
> org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>     at
> org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
>     4 more :
>    org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to
> create file
> /yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000
> on client 10.240.10.157 either because the filename is invalid or the file
> exists
>
>
> I've tried to output the file in /yatter.tagger/wikipedia/input,
> hdfs:///yatter.tagger/wikipeda/input, s3n://yatter.tagger/wikipedia/input,
> file:///yatter.tagger/wikipedia/input. I got the same error in all those
> cases.
> When I try with : hdfs://localhost:9000/yatter.tagger/wikipedia/input,
> hdfs://127.0.0.1:9000/yatter.tagger/wikipedia/input, I got a connection
> denied.
>
> Anyone have seen this error before?
>
> Martin
>
> On Tue, Mar 29, 2011 at 2:35 PM, Martin Provencher <
> mprovencher86@gmail.com> wrote:
>
>> Hi all,
>>     I'm trying to run the wikipedia example on Amazon Elastic-MapReduce.
>> I'm using
>> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Exampleand
>> https://cwiki.apache.org/MAHOUT/mahout-on-elastic-mapreduce.html as
>> sources.
>>
>> My problem is when I run :
>> ./elastic-mapreduce --jar
>> s3://yatter.tagger/tagger/mahout-examples-0.5-SNAPSHOT-job.jar \
>>     --main-class org.apache.mahout.driver.MahoutDriver \
>>     --arg wikipediaDataSetCreator  \
>>     --arg -i --arg s3n://yatter.tagger/wikipedia/chunks \
>>     --arg -o --arg /yatter.tagger/wikipedia/input \
>>     --arg -c --arg /yatter.tagger/tagger/categories.txt \
>>     -j JOB_ID \
>>
>> I always got :
>>
>> Exception in thread "main" java.io.FileNotFoundException: categories.txt (No such file or directory)
>> 	at java.io.FileInputStream.open(Native Method)
>> 	at java.io.FileInputStream.<init>(FileInputStream.java:106)
>>
>>
>> 	at org.apache.mahout.common.FileLineIterator.getFileInputStream(FileLineIterator.java:100)
>> 	at org.apache.mahout.common.FileLineIterable.<init>(FileLineIterable.java:53)
>> 	at org.apache.mahout.common.FileLineIterable.<init>(FileLineIterable.java:43)
>>
>>
>> 	at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:166)
>> 	at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:117)
>>
>>
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>
>>
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>>
>>
>> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>>
>>
>> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> 	at java.lang.reflect.Method.invoke(Method.java:597)
>> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>>
>>
>> I've tried to use s3://yatter.tagger/tagger/categories.txt,
>> s3n://yatter.tagger/tagger/categories.txt,
>> http://yatter.tagger.s3.amazonaws.com/tagger/categories.txt .
>> I've tried to copy the file directly on hadoop dfs with hadoop dfs
>> -copyFromLocal categories.txt /yatter.tagger/tagger/categories.txt after
>> downloading the file, but it didn't work.
>>
>> What do I miss here?
>>
>> Martin
>>
>
>

Re: Wikipedia example FileNotFoundException for categories

Posted by Martin Provencher <mp...@gmail.com>.

I found my first problem. I need to chmod the file to let hadoop be able to
read it. What I found very strange is that the file need to be on the master
host filesystem and not in the HDFS. I though the HDFS was the default one
...

My new problem seams to be related to the first one. When I run my command,
I go to 100% map, 44% reduce then I get stuck there. I got this error :

org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to
create file
/yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000
for DFSClient_attempt_201103292134_0010_r_000000_0 on client 10.240.10.157
because current leaseholder is trying to recreate file.
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1045)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.create(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.create(Unknown Source)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
    at
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
    at
org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
    at
org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
    at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
    at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
    at
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create
file
/yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000
on client 10.240.10.157 either because the filename is invalid or the file
exists
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1092)
    at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
    at
org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
    at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)

    at org.apache.hadoop.ipc.Client.call(Client.java:740)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
    at $Proxy1.create(Unknown Source)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
    at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
    at $Proxy1.create(Unknown Source)
    at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
    at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
    at
org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
    at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
    at
org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
    at
org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
    at
org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
    at
org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
    at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
    at
org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
    at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
    at
org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
    at org.apache.hadoop.mapred.Child.main(Child.java:170)

    4 more :
   org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to
create file
/yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000
on client 10.240.10.157 either because the filename is invalid or the file
exists


I've tried to output the file in /yatter.tagger/wikipedia/input,
hdfs:///yatter.tagger/wikipeda/input, s3n://yatter.tagger/wikipedia/input,
file:///yatter.tagger/wikipedia/input. I got the same error in all those
cases.
When I try with : hdfs://localhost:9000/yatter.tagger/wikipedia/input,
hdfs://127.0.0.1:9000/yatter.tagger/wikipedia/input, I got a connection
denied.

Anyone have seen this error before?

Martin

On Tue, Mar 29, 2011 at 2:35 PM, Martin Provencher
<mp...@gmail.com>wrote:

> Hi all,
>     I'm trying to run the wikipedia example on Amazon Elastic-MapReduce.
> I'm using
> https://cwiki.apache.org/confluence/display/MAHOUT/Wikipedia+Bayes+Exampleand
> https://cwiki.apache.org/MAHOUT/mahout-on-elastic-mapreduce.html as
> sources.
>
> My problem is when I run :
> ./elastic-mapreduce --jar
> s3://yatter.tagger/tagger/mahout-examples-0.5-SNAPSHOT-job.jar \
>     --main-class org.apache.mahout.driver.MahoutDriver \
>     --arg wikipediaDataSetCreator  \
>     --arg -i --arg s3n://yatter.tagger/wikipedia/chunks \
>     --arg -o --arg /yatter.tagger/wikipedia/input \
>     --arg -c --arg /yatter.tagger/tagger/categories.txt \
>     -j JOB_ID \
>
> I always got :
>
> Exception in thread "main" java.io.FileNotFoundException: categories.txt (No such file or directory)
> 	at java.io.FileInputStream.open(Native Method)
> 	at java.io.FileInputStream.<init>(FileInputStream.java:106)
>
> 	at org.apache.mahout.common.FileLineIterator.getFileInputStream(FileLineIterator.java:100)
> 	at org.apache.mahout.common.FileLineIterable.<init>(FileLineIterable.java:53)
> 	at org.apache.mahout.common.FileLineIterable.<init>(FileLineIterable.java:43)
>
> 	at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver.runJob(WikipediaDatasetCreatorDriver.java:166)
> 	at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorDriver.main(WikipediaDatasetCreatorDriver.java:117)
>
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
> 	at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>
> 	at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
>
> I've tried to use s3://yatter.tagger/tagger/categories.txt,
> s3n://yatter.tagger/tagger/categories.txt,
> http://yatter.tagger.s3.amazonaws.com/tagger/categories.txt .
> I've tried to copy the file directly on hadoop dfs with hadoop dfs
> -copyFromLocal categories.txt /yatter.tagger/tagger/categories.txt after
> downloading the file, but it didn't work.
>
> What do I miss here?
>
> Martin
>