You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2011/03/31 21:04:05 UTC

[jira] [Commented] (MAHOUT-646) Cannot run Wikipedia example on Amazon Elastic MapReduce (EMR)

    [ https://issues.apache.org/jira/browse/MAHOUT-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014144#comment-13014144 ] 

Sean Owen commented on MAHOUT-646:
----------------------------------

OK. Commenting out the line means the output format is not being used at all. Is that valid? (I don't know.)
But in any event there seems to be an actual problem, here:

{{...
at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
at org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
at org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
...}}

It's this attempt to create a file here that's resulting in the "already being created" exception. Anyone have any ideas here?

> Cannot run Wikipedia example on Amazon Elastic MapReduce (EMR)
> --------------------------------------------------------------
>
>                 Key: MAHOUT-646
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-646
>             Project: Mahout
>          Issue Type: Bug
>          Components: Classification
>    Affects Versions: 0.5
>            Reporter: Martin Provencher
>            Priority: Minor
>
> When I tried to run the Wikipedia example on EMR with all the categories existing in the Wikipedia dump, I got this error :
> org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000 for DFSClient_attempt_201103292134_0010_r_000000_0 on client 10.240.10.157 because current leaseholder is trying to recreate file.
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1045)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>     at $Proxy1.create(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy1.create(Unknown Source)
>     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
>     at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
>     at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>     at org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
>     at org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
>     at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
>     at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
>     at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>     at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create file /yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000 on client 10.240.10.157 either because the filename is invalid or the file exists
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1092)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:981)
>     at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:377)
>     at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:961)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:957)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:955)
>     at org.apache.hadoop.ipc.Client.call(Client.java:740)
>     at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:220)
>     at $Proxy1.create(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     at $Proxy1.create(Unknown Source)
>     at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.<init>(DFSClient.java:2709)
>     at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:491)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:195)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:524)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:505)
>     at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:412)
>     at org.apache.hadoop.mapreduce.lib.output.TextOutputFormat.getRecordWriter(TextOutputFormat.java:128)
>     at org.apache.mahout.classifier.bayes.MultipleTextOutputFormat.getBaseRecordWriter(MultipleTextOutputFormat.java:41)
>     at org.apache.mahout.classifier.bayes.MultipleOutputFormat$1.write(MultipleOutputFormat.java:81)
>     at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.write(ReduceTask.java:517)
>     at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>     at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:35)
>     at org.apache.mahout.classifier.bayes.WikipediaDatasetCreatorReducer.reduce(WikipediaDatasetCreatorReducer.java:28)
>     at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
>     at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:575)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:412)
>     at org.apache.hadoop.mapred.Child.main(Child.java:170)
>     4 more :
>    org.apache.hadoop.ipc.RemoteException: java.io.IOException: failed to create file /yatter.tagger/wikipedia/input/_temporary/_attempt__0000_r_000000_0/part-r-00000 on client 10.240.10.157 either because the filename is invalid or the file exists

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira