You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by fanooos <de...@gmail.com> on 2016/11/01 07:16:44 UTC

Spark Job Failed with FileNotFoundException

I have a spark cluster consists of 5 nodes and I have a spark job that should
process some files from a directory and send its content to Kafka.

I am trying to submit the job using the following command

bin$ ./spark-submit --total-executor-cores 20 --executor-memory 5G --class
org.css.java.FileMigration.FileSparkMigrator --master
spark://spark-master:7077
/home/me/FileMigrator-0.1.1-jar-with-dependencies.jar /home/me/shared
kafka01,kafka02,kafka03,kafka04,kafka05 kafka_topic


The directory /home/me/shared is mounted on all the 5 nodes but when I
submit the job I got the following exception

java.io.FileNotFoundException: File file:/home/me/shared/input_1.txt does
not exist
    at
org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:534)
    at
org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:747)
    at
org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:524)
    at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:409)
    at
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:140)
    at
org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:341)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:766)
    at
org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:108)
    at
org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67)
    at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:239)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:216)
    at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:88)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)



After some tries, I faced another weird behavior. When I submit the job
while the directory contains 1 or 2 files, the same exception is thrown on
the driver machine but the job continued and the files are processed
successfully. Once I add another file, the exception is thrown and the job
failed.





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Job-Failed-with-FileNotFoundException-tp27980.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org