You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Karen Murphy <k....@qub.ac.uk> on 2014/12/12 14:04:21 UTC
...FileNotFoundException: Path is not a file: - error on accessing
HDFS with sc.wholeTextFiles
When I try to load a text file from a HDFS path using sc.wholeTextFiles("hdfs://localhost:54310/graphx/anywebsite.com/anywebsite.com/")
I'm get the following error:
java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
(full stack trace at bottom of message).
If I switch my Scala code to reading the input file from the local disk, wholeTextFiles doesn't pickup directories (such as css in this case) and there is no exception raised.
The trace information in the 'local file' version shows that only plain text files are collected with sc.wholeTextFiles:
14/12/12 11:51:29 INFO WholeTextFileRDD: Input split: Paths:/tmp/anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247
Yet the trace information in the 'HDFS file' version shows directories too are collected with sc.wholeTextFiles:
14/12/12 11:49:07 INFO WholeTextFileRDD: Input split: Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
14/12/12 11:49:07 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
Should the HDFS version behave the same as the local version of wholeTextFiles as far as the treatment of directories/non plain text files are concerned ?
Any help, advice or workaround suggestions would be much appreciated,
Thanks
Karen
VERSION INFO
Ubuntu 14.04
Spark 1.1.1
Hadoop 2.5.2
Scala 2.10.4
FULL STACK TRACE
14/12/12 12:02:31 INFO WholeTextFileRDD: Input split: Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
14/12/12 12:02:31 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:60)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
... 37 more
14/12/12 12:02:31 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
Re: ...FileNotFoundException: Path is not a file: - error on accessing
HDFS with sc.wholeTextFiles
Posted by Karen Murphy <k....@qub.ac.uk>.
Thanks Akhil,
In line with your suggestion I have used the following 2 commands to
flatten the directory structure:
find . -type f -iname '*' -exec mv '{}' . \;
find . -type d -exec rm -rf '{}' \;
Kind Regards
Karen
On 12/12/14 13:25, Akhil Das wrote:
> I'm not quiet sure whether spark will go inside subdirectories and
> pick up files from it. You could do something like following to bring
> all files to one directory.
>
> find . -iname '*' -exec mv '{}' . \;
>
>
> Thanks
> Best Regards
>
> On Fri, Dec 12, 2014 at 6:34 PM, Karen Murphy <k.l.murphy@qub.ac.uk
> <ma...@qub.ac.uk>> wrote:
>
>
> When I try to load a text file from a HDFS path using
> sc.wholeTextFiles("hdfs://localhost:54310/graphx/anywebsite.com/anywebsite.com/
> <http://anywebsite.com/anywebsite.com/>")
>
> I'm get the following error:
> java.io.FileNotFoundException: Path is not a file:
> /graphx/anywebsite.com/anywebsite.com/css
> <http://anywebsite.com/anywebsite.com/css>
> (full stack trace at bottom of message).
>
> If I switch my Scala code to reading the input file from the local
> disk, wholeTextFiles doesn't pickup directories (such as css in
> this case) and there is no exception raised.
>
> The trace information in the 'local file' version shows that only
> plain text files are collected with sc.wholeTextFiles:
>
> 14/12/12 11:51:29 INFO WholeTextFileRDD: Input split:
> Paths:/tmp/anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247
> <http://anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247>
>
> Yet the trace information in the 'HDFS file' version shows
> directories too are collected with sc.wholeTextFiles:
>
> 14/12/12 11:49:07 INFO WholeTextFileRDD: Input split:
> Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
> <http://anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0>
> 14/12/12 11:49:07 ERROR Executor: Exception in task 1.0 in stage
> 0.0 (TID 1)
> java.io.FileNotFoundException: Path is not a file:
> /graphx/anywebsite.com/anywebsite.com/css
> <http://anywebsite.com/anywebsite.com/css>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>
> Should the HDFS version behave the same as the local version of
> wholeTextFiles as far as the treatment of directories/non plain
> text files are concerned ?
>
> Any help, advice or workaround suggestions would be much appreciated,
>
> Thanks
> Karen
>
> VERSION INFO
> Ubuntu 14.04
> Spark 1.1.1
> Hadoop 2.5.2
> Scala 2.10.4
>
> FULL STACK TRACE
> 14/12/12 12:02:31 INFO WholeTextFileRDD: Input split:
> Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
> <http://anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0>
> 14/12/12 12:02:31 ERROR Executor: Exception in task 1.0 in stage
> 0.0 (TID 1)
> java.io.FileNotFoundException: Path is not a file:
> /graphx/anywebsite.com/anywebsite.com/css
> <http://anywebsite.com/anywebsite.com/css>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at
> java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
> at
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
> at
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
> at
> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
> at
> org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:60)
> at
> org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
> at
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at
> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at
> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> at scala.collection.TraversableOnce$class.to
> <http://class.to>(TraversableOnce.scala:273)
> at scala.collection.AbstractIterator.to
> <http://scala.collection.AbstractIterator.to>(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> at
> scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> at
> scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
> Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
> <http://anywebsite.com/anywebsite.com/css>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1411)
> at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
> ... 37 more
> 14/12/12 12:02:31 WARN TaskSetManager: Lost task 1.0 in stage 0.0
> (TID 1, localhost): java.io.FileNotFoundException: Path is not a
> file: /graphx/anywebsite.com/anywebsite.com/css
> <http://anywebsite.com/anywebsite.com/css>
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>
Re: ...FileNotFoundException: Path is not a file: - error on
accessing HDFS with sc.wholeTextFiles
Posted by Akhil Das <ak...@sigmoidanalytics.com>.
I'm not quiet sure whether spark will go inside subdirectories and pick up
files from it. You could do something like following to bring all files to
one directory.
find . -iname '*' -exec mv '{}' . \;
Thanks
Best Regards
On Fri, Dec 12, 2014 at 6:34 PM, Karen Murphy <k....@qub.ac.uk> wrote:
>
>
> When I try to load a text file from a HDFS path using
> sc.wholeTextFiles("hdfs://localhost:54310/graphx/
> anywebsite.com/anywebsite.com/")
>
> I'm get the following error:
>
> java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
> (full stack trace at bottom of message).
>
> If I switch my Scala code to reading the input file from the local disk,
> wholeTextFiles doesn't pickup directories (such as css in this case) and
> there is no exception raised.
>
> The trace information in the 'local file' version shows that only plain
> text files are collected with sc.wholeTextFiles:
>
> 14/12/12 11:51:29 INFO WholeTextFileRDD: Input split: Paths:/tmp/
> anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247
>
> Yet the trace information in the 'HDFS file' version shows directories
> too are collected with sc.wholeTextFiles:
>
> 14/12/12 11:49:07 INFO WholeTextFileRDD: Input split: Paths:/graphx/
> anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
> 14/12/12 11:49:07 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID
> 1)
> java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>
> Should the HDFS version behave the same as the local version of
> wholeTextFiles as far as the treatment of directories/non plain text files
> are concerned ?
>
> Any help, advice or workaround suggestions would be much appreciated,
>
> Thanks
> Karen
>
> VERSION INFO
> Ubuntu 14.04
> Spark 1.1.1
> Hadoop 2.5.2
> Scala 2.10.4
>
> FULL STACK TRACE
> 14/12/12 12:02:31 INFO WholeTextFileRDD: Input split: Paths:/graphx/
> anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
> 14/12/12 12:02:31 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID
> 1)
> java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
> at
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
> at
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
> at
> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
> at
> org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:60)
> at
> org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
> at
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138)
> at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
> at scala.collection.Iterator$class.foreach(Iterator.scala:727)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
> at
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
> at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
> at scala.collection.TraversableOnce$class.to
> (TraversableOnce.scala:273)
> at scala.collection.AbstractIterator.to(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
> at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
> at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
> at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
> at org.apache.spark.scheduler.Task.run(Task.scala:54)
> at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path
> is not a file: /graphx/anywebsite.com/anywebsite.com/css
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1411)
> at org.apache.hadoop.ipc.Client.call(Client.java:1364)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
> ... 37 more
> 14/12/12 12:02:31 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
> localhost): java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>
>