You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Karen Murphy <k....@qub.ac.uk> on 2014/12/12 14:04:21 UTC

...FileNotFoundException: Path is not a file: - error on accessing HDFS with sc.wholeTextFiles

When I try to load a text file from a HDFS path using sc.wholeTextFiles("hdfs://localhost:54310/graphx/anywebsite.com/anywebsite.com/")

I'm get the following error:

java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
(full stack trace at bottom of message).

If I switch my Scala code to reading the input file from the local disk, wholeTextFiles doesn't pickup directories (such as css in this case) and there is no exception raised.

The trace information in the 'local file' version shows that only plain text files are collected with sc.wholeTextFiles:

14/12/12 11:51:29 INFO WholeTextFileRDD: Input split: Paths:/tmp/anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247

Yet the trace information in the 'HDFS file' version shows directories too are collected with sc.wholeTextFiles:

14/12/12 11:49:07 INFO WholeTextFileRDD: Input split: Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
14/12/12 11:49:07 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)

Should the HDFS version behave the same as the local version of wholeTextFiles as far as the treatment of directories/non plain text files are concerned ?

Any help, advice or workaround suggestions would be much appreciated,

Thanks
Karen

VERSION INFO
Ubuntu 14.04
Spark 1.1.1
Hadoop 2.5.2
Scala 2.10.4

FULL STACK TRACE
14/12/12 12:02:31 INFO WholeTextFileRDD: Input split: Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
14/12/12 12:02:31 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
        at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
        at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
        at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
        at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
        at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
        at org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
        at org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:60)
        at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
        at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
        at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
        at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
        at scala.collection.AbstractIterator.to(Iterator.scala:1157)
        at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
        at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
        at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
        at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
        at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
        at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
        at org.apache.spark.scheduler.Task.run(Task.scala:54)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)

        at org.apache.hadoop.ipc.Client.call(Client.java:1411)
        at org.apache.hadoop.ipc.Client.call(Client.java:1364)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
        at org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
        ... 37 more
14/12/12 12:02:31 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localhost): java.io.FileNotFoundException: Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
        at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)

Re: ...FileNotFoundException: Path is not a file: - error on accessing HDFS with sc.wholeTextFiles

Posted by Karen Murphy <k....@qub.ac.uk>.

Thanks Akhil,

In line with your suggestion I have used the following 2 commands to 
flatten the directory structure:

find . -type f -iname '*' -exec  mv '{}' . \;
find . -type d -exec rm -rf '{}' \;

Kind Regards
Karen



On 12/12/14 13:25, Akhil Das wrote:
> I'm not quiet sure whether spark will go inside subdirectories and 
> pick up files from it. You could do something like following to bring 
> all files to one directory.
>
>         find . -iname '*' -exec mv '{}' . \;
>
>
> Thanks
> Best Regards
>
> On Fri, Dec 12, 2014 at 6:34 PM, Karen Murphy <k.l.murphy@qub.ac.uk 
> <ma...@qub.ac.uk>> wrote:
>
>
>     When I try to load a text file from a HDFS path using
>     sc.wholeTextFiles("hdfs://localhost:54310/graphx/anywebsite.com/anywebsite.com/
>     <http://anywebsite.com/anywebsite.com/>")
>
>     I'm get the following error:
>     java.io.FileNotFoundException: Path is not a file:
>     /graphx/anywebsite.com/anywebsite.com/css
>     <http://anywebsite.com/anywebsite.com/css>
>     (full stack trace at bottom of message).
>
>     If I switch my Scala code to reading the input file from the local
>     disk, wholeTextFiles doesn't pickup directories (such as css in
>     this case) and there is no exception raised.
>
>     The trace information in the 'local file' version shows that only
>     plain text files are collected with sc.wholeTextFiles:
>
>     14/12/12 11:51:29 INFO WholeTextFileRDD: Input split:
>     Paths:/tmp/anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247
>     <http://anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247>
>
>     Yet the trace information in the 'HDFS file' version shows
>     directories too are collected with sc.wholeTextFiles:
>
>     14/12/12 11:49:07 INFO WholeTextFileRDD: Input split:
>     Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
>     <http://anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0>
>     14/12/12 11:49:07 ERROR Executor: Exception in task 1.0 in stage
>     0.0 (TID 1)
>     java.io.FileNotFoundException: Path is not a file:
>     /graphx/anywebsite.com/anywebsite.com/css
>     <http://anywebsite.com/anywebsite.com/css>
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>
>     Should the HDFS version behave the same as the local version of
>     wholeTextFiles as far as the treatment of directories/non plain
>     text files are concerned ?
>
>     Any help, advice or workaround suggestions would be much appreciated,
>
>     Thanks
>     Karen
>
>     VERSION INFO
>     Ubuntu 14.04
>     Spark 1.1.1
>     Hadoop 2.5.2
>     Scala 2.10.4
>
>     FULL STACK TRACE
>     14/12/12 12:02:31 INFO WholeTextFileRDD: Input split:
>     Paths:/graphx/anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
>     <http://anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0>
>     14/12/12 12:02:31 ERROR Executor: Exception in task 1.0 in stage
>     0.0 (TID 1)
>     java.io.FileNotFoundException: Path is not a file:
>     /graphx/anywebsite.com/anywebsite.com/css
>     <http://anywebsite.com/anywebsite.com/css>
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
>             at
>     org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
>             at
>     org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>             at
>     org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>             at
>     org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>             at java.security.AccessController.doPrivileged(Native Method)
>             at javax.security.auth.Subject.doAs(Subject.java:415)
>             at
>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>             at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>             at
>     sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>             at
>     sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>             at
>     java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>             at
>     org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>             at
>     org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>             at
>     org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
>             at
>     org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
>             at
>     org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
>             at
>     org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
>             at
>     org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
>             at
>     org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
>             at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
>             at
>     org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
>             at
>     org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
>             at
>     org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>             at
>     org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
>             at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
>             at
>     org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:60)
>             at
>     org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
>             at
>     org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138)
>             at
>     org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>             at
>     scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>             at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>             at
>     scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>             at
>     scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>             at
>     scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>             at
>     scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>             at scala.collection.TraversableOnce$class.to
>     <http://class.to>(TraversableOnce.scala:273)
>             at scala.collection.AbstractIterator.to
>     <http://scala.collection.AbstractIterator.to>(Iterator.scala:1157)
>             at
>     scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>             at
>     scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>             at
>     scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>             at
>     scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>             at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
>             at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
>             at
>     org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
>             at
>     org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
>             at
>     org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>             at org.apache.spark.scheduler.Task.run(Task.scala:54)
>             at
>     org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>             at
>     java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>             at
>     java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     Caused by:
>     org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException):
>     Path is not a file: /graphx/anywebsite.com/anywebsite.com/css
>     <http://anywebsite.com/anywebsite.com/css>
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
>             at
>     org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
>             at
>     org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>             at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>             at
>     org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>             at
>     org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>             at java.security.AccessController.doPrivileged(Native Method)
>             at javax.security.auth.Subject.doAs(Subject.java:415)
>             at
>     org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>             at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>             at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>             at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>             at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>             at
>     sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>             at
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>             at java.lang.reflect.Method.invoke(Method.java:606)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>             at
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>             at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
>             at
>     org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
>             ... 37 more
>     14/12/12 12:02:31 WARN TaskSetManager: Lost task 1.0 in stage 0.0
>     (TID 1, localhost): java.io.FileNotFoundException: Path is not a
>     file: /graphx/anywebsite.com/anywebsite.com/css
>     <http://anywebsite.com/anywebsite.com/css>
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>             at
>     org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
>             at
>     org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
>             at
>     org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
>             at
>     org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
>             at
>     org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>             at
>     org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>

Re: ...FileNotFoundException: Path is not a file: - error on accessing HDFS with sc.wholeTextFiles

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

I'm not quiet sure whether spark will go inside subdirectories and pick up
files from it. You could do something like following to bring all files to
one directory.

find . -iname '*' -exec mv '{}' . \;


Thanks
Best Regards

On Fri, Dec 12, 2014 at 6:34 PM, Karen Murphy <k....@qub.ac.uk> wrote:
>
>
>  When I try to load a text file from a HDFS path using
> sc.wholeTextFiles("hdfs://localhost:54310/graphx/
> anywebsite.com/anywebsite.com/")
>
>  I'm get the following error:
>
> java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
> (full stack trace at bottom of message).
>
>  If I switch my Scala code to reading the input file from the local disk,
> wholeTextFiles doesn't pickup directories (such as css in this case) and
> there is no exception raised.
>
>  The trace information in the 'local file' version shows that only plain
> text files are collected with sc.wholeTextFiles:
>
>  14/12/12 11:51:29 INFO WholeTextFileRDD: Input split: Paths:/tmp/
> anywebsite.com/anywebsite.com/index-2.html:0+6192,/tmp/anywebsite.com/anywebsite.com/gallery.html:0+3258,/tmp/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/tmp/anywebsite.com/anywebsite.com/jquery.html:0+326,/tmp/anywebsite.com/anywebsite.com/index.html:0+6174,/tmp/anywebsite.com/anywebsite.com/contact.html:0+3050,/tmp/anywebsite.com/anywebsite.com/archive.html:0+3247
>
>  Yet the trace information in the 'HDFS file' version shows directories
> too are collected with sc.wholeTextFiles:
>
>  14/12/12 11:49:07 INFO WholeTextFileRDD: Input split: Paths:/graphx/
> anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
> 14/12/12 11:49:07 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID
> 1)
> java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>
>  Should the HDFS version behave the same as the local version of
> wholeTextFiles as far as the treatment of directories/non plain text files
> are concerned ?
>
>  Any help, advice or workaround suggestions would be much appreciated,
>
>  Thanks
> Karen
>
>  VERSION INFO
> Ubuntu 14.04
> Spark 1.1.1
> Hadoop 2.5.2
> Scala 2.10.4
>
>  FULL STACK TRACE
> 14/12/12 12:02:31 INFO WholeTextFileRDD: Input split: Paths:/graphx/
> anywebsite.com/anywebsite.com/archive.html:0+3247,/graphx/anywebsite.com/anywebsite.com/contact.html:0+3050,/graphx/anywebsite.com/anywebsite.com/css:0+0,/graphx/anywebsite.com/anywebsite.com/exhibitions.html:0+6663,/graphx/anywebsite.com/anywebsite.com/gallery.html:0+3258,/graphx/anywebsite.com/anywebsite.com/highslide:0+0,/graphx/anywebsite.com/anywebsite.com/highslideIndex:0+0,/graphx/anywebsite.com/anywebsite.com/images:0+0,/graphx/anywebsite.com/anywebsite.com/index-2.html:0+6192,/graphx/anywebsite.com/anywebsite.com/index.html:0+6174,/graphx/anywebsite.com/anywebsite.com/jquery.html:0+326,/graphx/anywebsite.com/anywebsite.com/js:0+0
> 14/12/12 12:02:31 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID
> 1)
> java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>          at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
>         at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>         at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
>         at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
>         at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1167)
>         at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1155)
>         at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:1145)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:268)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:235)
>         at
> org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:228)
>         at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1318)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:293)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:289)
>         at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:289)
>         at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:764)
>         at
> org.apache.spark.input.WholeTextFileRecordReader.nextKeyValue(WholeTextFileRecordReader.scala:60)
>         at
> org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.nextKeyValue(CombineFileRecordReader.java:69)
>         at
> org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:138)
>         at
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>         at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         at
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
>         at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
>         at
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
>         at scala.collection.TraversableOnce$class.to
> (TraversableOnce.scala:273)
>         at scala.collection.AbstractIterator.to(Iterator.scala:1157)
>         at
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
>         at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
>         at
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
>         at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
>         at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
>         at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
>         at
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1143)
>         at
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
>         at org.apache.spark.scheduler.Task.run(Task.scala:54)
>         at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): Path
> is not a file: /graphx/anywebsite.com/anywebsite.com/css
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:415)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>
>          at org.apache.hadoop.ipc.Client.call(Client.java:1411)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1364)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>         at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>         at com.sun.proxy.$Proxy10.getBlockLocations(Unknown Source)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getBlockLocations(ClientNamenodeProtocolTranslatorPB.java:225)
>         at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:1165)
>         ... 37 more
> 14/12/12 12:02:31 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1,
> localhost): java.io.FileNotFoundException: Path is not a file: /graphx/
> anywebsite.com/anywebsite.com/css
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:68)
>         at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:54)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1795)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsInt(FSNamesystem.java:1738)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1718)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1690)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:519)
>         at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:337)
>         at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>
>