You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by Nick Travers <n....@gmail.com> on 2015/03/30 07:34:46 UTC

java.io.FileNotFoundException when using HDFS in cluster mode

Hi List,

I'm following this example  here
<https://github.com/databricks/learning-spark/tree/master/mini-complete-example>  
with the following:

$SPARK_HOME/bin/spark-submit \
  --deploy-mode cluster \
  --master spark://host.domain.ex:7077 \
  --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
 
hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
\
  hdfs://host.domain.ex/user/nickt/linkage
hdfs://host.domain.ex/user/nickt/wordcounts

The jar is submitted fine and I can see it appear on the driver node (i.e.
connecting to and reading from HDFS ok):

-rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
learning-spark-mini-example_2.10-0.0.1.jar
-rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
-rw-r--r-- 1 nickt nickt    0 Mar 29 22:05 stdout

But it's failing due to a java.io.FileNotFoundException saying my input file
is missing:

Caused by: java.io.FileNotFoundException: Added file
file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
does not exist.

I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the
workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to
the file on each of the hosts.

Has anyone come up against this before when reading from HDFS? No doubt I'm
doing something wrong.

Full trace below:

Launch Command: "/usr/java/java8/bin/java" "-cp"
":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar"
"-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false"
"-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount"
"-Dspark.akka.askTimeout=10"
"-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar"
"-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M"
"org.apache.spark.deploy.worker.DriverWrapper"
"akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker"
"/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar"
"com.oreilly.learningsparkexamples.mini.scala.WordCount"
"hdfs://host.domain.ex/user/nickt/linkage"
"hdfs://host.domain.ex/user/nickt/wordcounts"
========================================

log4j:WARN No appenders could be found for logger
(org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
more info.
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(nickt); users
with modify permissions: Set(nickt)
15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
44201.
15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(nickt); users
with modify permissions: Set(nickt)
15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
port 33382.
15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
/tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
MB
15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/29 22:05:05 INFO AbstractConnector: Started
SocketConnector@0.0.0.0:42484
15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
server' on port 42484.
15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/29 22:05:06 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on port
4040.
15/03/29 22:05:06 INFO SparkUI: Started SparkUI at
http://host5.domain.ex:4040
15/03/29 22:05:06 ERROR SparkContext: Jar not found at
target/scala-2.10/learning-spark-mini-example_2.10-0.0.1.jar
15/03/29 22:05:06 INFO AppClient$ClientActor: Connecting to master
akka.tcp://sparkMaster@host.domain.ex:7077/user/Master...
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Connected to Spark
cluster with app ID app-20150329220506-0027
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/0 on worker-20150329112422-host3.domain.ex-33765
(host3.domain.ex:33765) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/0 on hostPort host3.domain.ex:33765 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/1 on worker-20150329112422-host6.domain.ex-35464
(host6.domain.ex:35464) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/1 on hostPort host6.domain.ex:35464 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/2 on worker-20150329112422-host2.domain.ex-40914
(host2.domain.ex:40914) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/2 on hostPort host2.domain.ex:40914 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/3 on worker-20150329112421-host4.domain.ex-35927
(host4.domain.ex:35927) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/3 on hostPort host4.domain.ex:35927 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/4 on worker-20150329112422-host1.domain.ex-60546
(host1.domain.ex:60546) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/4 on hostPort host1.domain.ex:60546 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/5 on worker-20150329112421-host.domain.ex-59485
(host.domain.ex:59485) with 64 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/5 on hostPort host.domain.ex:59485 with 64 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
app-20150329220506-0027/6 on worker-20150329112421-host5.domain.ex-40830
(host5.domain.ex:40830) with 63 cores
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20150329220506-0027/6 on hostPort host5.domain.ex:40830 with 63 cores,
512.0 MB RAM
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/2 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/0 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/1 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/4 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/3 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/5 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/0 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/1 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/2 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/6 is now LOADING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/3 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/4 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/5 is now RUNNING
15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
app-20150329220506-0027/6 is now RUNNING
15/03/29 22:05:06 INFO NettyBlockTransferService: Server created on 39447
15/03/29 22:05:06 INFO BlockManagerMaster: Trying to register BlockManager
15/03/29 22:05:06 INFO BlockManagerMasterActor: Registering block manager
host5.domain.ex:39447 with 265.1 MB RAM, BlockManagerId(<driver>,
host5.domain.ex, 39447)
15/03/29 22:05:06 INFO BlockManagerMaster: Registered BlockManager
15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.0
Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at
org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:59)
    at
org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: java.io.FileNotFoundException: Added file
file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
does not exist.
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1089)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1065)
    at
com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:21)
    at
com.oreilly.learningsparkexamples.mini.scala.WordCount.main(WordCount.scala)
    ... 6 more



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

Re: java.io.FileNotFoundException when using HDFS in cluster mode

Posted by Akhil Das <ak...@sigmoidanalytics.com>.

What happens when you do:

sc.textFile("hdfs://path/to/the_file.txt")

Thanks
Best Regards

On Mon, Mar 30, 2015 at 11:04 AM, Nick Travers <n....@gmail.com>
wrote:

> Hi List,
>
> I'm following this example  here
> <
> https://github.com/databricks/learning-spark/tree/master/mini-complete-example
> >
> with the following:
>
> $SPARK_HOME/bin/spark-submit \
>   --deploy-mode cluster \
>   --master spark://host.domain.ex:7077 \
>   --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
>
> hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
> \
>   hdfs://host.domain.ex/user/nickt/linkage
> hdfs://host.domain.ex/user/nickt/wordcounts
>
> The jar is submitted fine and I can see it appear on the driver node (i.e.
> connecting to and reading from HDFS ok):
>
> -rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
> learning-spark-mini-example_2.10-0.0.1.jar
> -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
> -rw-r--r-- 1 nickt nickt    0 Mar 29 22:05 stdout
>
> But it's failing due to a java.io.FileNotFoundException saying my input
> file
> is missing:
>
> Caused by: java.io.FileNotFoundException: Added file
>
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
>
> I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the
> workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to
> the file on each of the hosts.
>
> Has anyone come up against this before when reading from HDFS? No doubt I'm
> doing something wrong.
>
> Full trace below:
>
> Launch Command: "/usr/java/java8/bin/java" "-cp"
>
> ":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar"
> "-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false"
> "-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "-Dspark.akka.askTimeout=10"
>
> "-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar"
> "-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M"
> "org.apache.spark.deploy.worker.DriverWrapper"
> "akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker"
>
> "/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar"
> "com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "hdfs://host.domain.ex/user/nickt/linkage"
> "hdfs://host.domain.ex/user/nickt/wordcounts"
> ========================================
>
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
> 44201.
> 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
> port 33382.
> 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
> 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
> 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
>
> /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
> 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
> akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
> MB
> 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
>
> /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
> 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
> 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/29 22:05:05 INFO AbstractConnector: Started
> SocketConnector@0.0.0.0:42484
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
> server' on port 42484.
> 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/29 22:05:06 INFO AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:4040
> 15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on
> port
> 4040.
> 15/03/29 22:05:06 INFO SparkUI: Started SparkUI at
> http://host5.domain.ex:4040
> 15/03/29 22:05:06 ERROR SparkContext: Jar not found at
> target/scala-2.10/learning-spark-mini-example_2.10-0.0.1.jar
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Connecting to master
> akka.tcp://sparkMaster@host.domain.ex:7077/user/Master...
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Connected to Spark
> cluster with app ID app-20150329220506-0027
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/0 on worker-20150329112422-host3.domain.ex-33765
> (host3.domain.ex:33765) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/0 on hostPort host3.domain.ex:33765 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/1 on worker-20150329112422-host6.domain.ex-35464
> (host6.domain.ex:35464) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/1 on hostPort host6.domain.ex:35464 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/2 on worker-20150329112422-host2.domain.ex-40914
> (host2.domain.ex:40914) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/2 on hostPort host2.domain.ex:40914 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/3 on worker-20150329112421-host4.domain.ex-35927
> (host4.domain.ex:35927) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/3 on hostPort host4.domain.ex:35927 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/4 on worker-20150329112422-host1.domain.ex-60546
> (host1.domain.ex:60546) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/4 on hostPort host1.domain.ex:60546 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/5 on worker-20150329112421-host.domain.ex-59485
> (host.domain.ex:59485) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/5 on hostPort host.domain.ex:59485 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/6 on worker-20150329112421-host5.domain.ex-40830
> (host5.domain.ex:40830) with 63 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/6 on hostPort host5.domain.ex:40830 with 63 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/2 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/0 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/1 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/4 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/3 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/5 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/0 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/1 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/2 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/6 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/3 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/4 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/5 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/6 is now RUNNING
> 15/03/29 22:05:06 INFO NettyBlockTransferService: Server created on 39447
> 15/03/29 22:05:06 INFO BlockManagerMaster: Trying to register BlockManager
> 15/03/29 22:05:06 INFO BlockManagerMasterActor: Registering block manager
> host5.domain.ex:39447 with 265.1 MB RAM, BlockManagerId(<driver>,
> host5.domain.ex, 39447)
> 15/03/29 22:05:06 INFO BlockManagerMaster: Registered BlockManager
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after reached minRegisteredResourcesRatio:
> 0.0
> Exception in thread "main" java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:483)
>     at
> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:59)
>     at
> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> Caused by: java.io.FileNotFoundException: Added file
>
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
>     at org.apache.spark.SparkContext.addFile(SparkContext.scala:1089)
>     at org.apache.spark.SparkContext.addFile(SparkContext.scala:1065)
>     at
>
> com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:21)
>     at
>
> com.oreilly.learningsparkexamples.mini.scala.WordCount.main(WordCount.scala)
>     ... 6 more
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>

Re: java.io.FileNotFoundException when using HDFS in cluster mode

Posted by nsalian <ne...@gmail.com>.

Try running it like this:

sudo -u hdfs spark-submit --class org.apache.spark.examples.SparkPi
--deploy-mode cluster --master yarn
hdfs:///user/spark/spark-examples-1.2.0-cdh5.3.2-hadoop2.5.0-cdh5.3.2.jar 10


Caveats:
1) Make sure the permissions of /user/nick is 775 or 777.
2) No need for hostname, try hdfs://path-to-jar



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287p22303.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org

RE: java.io.FileNotFoundException when using HDFS in cluster mode

Posted by java8964 <ja...@hotmail.com>.

I think the jar file has to be local. In HDFS is not supported yet in Spark.
See this answer:
http://stackoverflow.com/questions/28739729/spark-submit-not-working-when-application-jar-is-in-hdfs

> Date: Sun, 29 Mar 2015 22:34:46 -0700
> From: n.e.travers@gmail.com
> To: user@spark.apache.org
> Subject: java.io.FileNotFoundException when using HDFS in cluster mode
> 
> Hi List,
> 
> I'm following this example  here
> <https://github.com/databricks/learning-spark/tree/master/mini-complete-example>  
> with the following:
> 
> $SPARK_HOME/bin/spark-submit \
>   --deploy-mode cluster \
>   --master spark://host.domain.ex:7077 \
>   --class com.oreilly.learningsparkexamples.mini.scala.WordCount \
>  
> hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar
> \
>   hdfs://host.domain.ex/user/nickt/linkage
> hdfs://host.domain.ex/user/nickt/wordcounts
> 
> The jar is submitted fine and I can see it appear on the driver node (i.e.
> connecting to and reading from HDFS ok):
> 
> -rw-r--r-- 1 nickt nickt  15K Mar 29 22:05
> learning-spark-mini-example_2.10-0.0.1.jar
> -rw-r--r-- 1 nickt nickt 9.2K Mar 29 22:05 stderr
> -rw-r--r-- 1 nickt nickt    0 Mar 29 22:05 stdout
> 
> But it's failing due to a java.io.FileNotFoundException saying my input file
> is missing:
> 
> Caused by: java.io.FileNotFoundException: Added file
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
> 
> I'm using sc.addFile("hdfs://path/to/the_file.txt") to propagate to all the
> workers and sc.textFile(SparkFiles("the_file.txt")) to return the path to
> the file on each of the hosts.
> 
> Has anyone come up against this before when reading from HDFS? No doubt I'm
> doing something wrong.
> 
> Full trace below:
> 
> Launch Command: "/usr/java/java8/bin/java" "-cp"
> ":/home/nickt/spark-1.3.0/conf:/home/nickt/spark-1.3.0/assembly/target/scala-2.10/spark-assembly-1.3.0-hadoop2.0.0-mr1-cdh4.6.0.jar"
> "-Dakka.loglevel=WARNING" "-Dspark.driver.supervise=false"
> "-Dspark.app.name=com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "-Dspark.akka.askTimeout=10"
> "-Dspark.jars=hdfs://host.domain.ex/user/nickt/learning-spark-mini-example_2.10-0.0.1.jar"
> "-Dspark.master=spark://host.domain.ex:7077" "-Xms512M" "-Xmx512M"
> "org.apache.spark.deploy.worker.DriverWrapper"
> "akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker"
> "/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/learning-spark-mini-example_2.10-0.0.1.jar"
> "com.oreilly.learningsparkexamples.mini.scala.WordCount"
> "hdfs://host.domain.ex/user/nickt/linkage"
> "hdfs://host.domain.ex/user/nickt/wordcounts"
> ========================================
> 
> log4j:WARN No appenders could be found for logger
> (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> more info.
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'Driver' on port
> 44201.
> 15/03/29 22:05:05 INFO WorkerWatcher: Connecting to worker
> akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO SparkContext: Running Spark version 1.3.0
> 15/03/29 22:05:05 INFO SecurityManager: Changing view acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: Changing modify acls to: nickt
> 15/03/29 22:05:05 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(nickt); users
> with modify permissions: Set(nickt)
> 15/03/29 22:05:05 INFO Slf4jLogger: Slf4jLogger started
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'sparkDriver' on
> port 33382.
> 15/03/29 22:05:05 INFO SparkEnv: Registering MapOutputTracker
> 15/03/29 22:05:05 INFO SparkEnv: Registering BlockManagerMaster
> 15/03/29 22:05:05 INFO DiskBlockManager: Created local directory at
> /tmp/spark-9c52eb1e-92b9-4e3f-b0e9-699a158f8e40/blockmgr-222a2522-a0fc-4535-a939-4c14d92dc666
> 15/03/29 22:05:05 INFO WorkerWatcher: Successfully connected to
> akka.tcp://sparkWorker@host5.domain.ex:40830/user/Worker
> 15/03/29 22:05:05 INFO MemoryStore: MemoryStore started with capacity 265.1
> MB
> 15/03/29 22:05:05 INFO HttpFileServer: HTTP File server directory is
> /tmp/spark-031afddd-2a75-4232-931a-89e502b0d722/httpd-7e22bb57-3cfe-4c89-aaec-4e6ca1a65f66
> 15/03/29 22:05:05 INFO HttpServer: Starting HTTP Server
> 15/03/29 22:05:05 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/29 22:05:05 INFO AbstractConnector: Started
> SocketConnector@0.0.0.0:42484
> 15/03/29 22:05:05 INFO Utils: Successfully started service 'HTTP file
> server' on port 42484.
> 15/03/29 22:05:05 INFO SparkEnv: Registering OutputCommitCoordinator
> 15/03/29 22:05:06 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/29 22:05:06 INFO AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:4040
> 15/03/29 22:05:06 INFO Utils: Successfully started service 'SparkUI' on port
> 4040.
> 15/03/29 22:05:06 INFO SparkUI: Started SparkUI at
> http://host5.domain.ex:4040
> 15/03/29 22:05:06 ERROR SparkContext: Jar not found at
> target/scala-2.10/learning-spark-mini-example_2.10-0.0.1.jar
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Connecting to master
> akka.tcp://sparkMaster@host.domain.ex:7077/user/Master...
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Connected to Spark
> cluster with app ID app-20150329220506-0027
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/0 on worker-20150329112422-host3.domain.ex-33765
> (host3.domain.ex:33765) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/0 on hostPort host3.domain.ex:33765 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/1 on worker-20150329112422-host6.domain.ex-35464
> (host6.domain.ex:35464) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/1 on hostPort host6.domain.ex:35464 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/2 on worker-20150329112422-host2.domain.ex-40914
> (host2.domain.ex:40914) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/2 on hostPort host2.domain.ex:40914 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/3 on worker-20150329112421-host4.domain.ex-35927
> (host4.domain.ex:35927) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/3 on hostPort host4.domain.ex:35927 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/4 on worker-20150329112422-host1.domain.ex-60546
> (host1.domain.ex:60546) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/4 on hostPort host1.domain.ex:60546 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/5 on worker-20150329112421-host.domain.ex-59485
> (host.domain.ex:59485) with 64 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/5 on hostPort host.domain.ex:59485 with 64 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor added:
> app-20150329220506-0027/6 on worker-20150329112421-host5.domain.ex-40830
> (host5.domain.ex:40830) with 63 cores
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: Granted executor ID
> app-20150329220506-0027/6 on hostPort host5.domain.ex:40830 with 63 cores,
> 512.0 MB RAM
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/2 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/0 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/1 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/4 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/3 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/5 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/0 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/1 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/2 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/6 is now LOADING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/3 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/4 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/5 is now RUNNING
> 15/03/29 22:05:06 INFO AppClient$ClientActor: Executor updated:
> app-20150329220506-0027/6 is now RUNNING
> 15/03/29 22:05:06 INFO NettyBlockTransferService: Server created on 39447
> 15/03/29 22:05:06 INFO BlockManagerMaster: Trying to register BlockManager
> 15/03/29 22:05:06 INFO BlockManagerMasterActor: Registering block manager
> host5.domain.ex:39447 with 265.1 MB RAM, BlockManagerId(<driver>,
> host5.domain.ex, 39447)
> 15/03/29 22:05:06 INFO BlockManagerMaster: Registered BlockManager
> 15/03/29 22:05:06 INFO SparkDeploySchedulerBackend: SchedulerBackend is
> ready for scheduling beginning after reached minRegisteredResourcesRatio:
> 0.0
> Exception in thread "main" java.lang.reflect.InvocationTargetException
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:483)
>     at
> org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:59)
>     at
> org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> Caused by: java.io.FileNotFoundException: Added file
> file:/home/nickt/spark-1.3.0/work/driver-20150329220503-0021/hdfs:/host.domain.ex/user/nickt/linkage
> does not exist.
>     at org.apache.spark.SparkContext.addFile(SparkContext.scala:1089)
>     at org.apache.spark.SparkContext.addFile(SparkContext.scala:1065)
>     at
> com.oreilly.learningsparkexamples.mini.scala.WordCount$.main(WordCount.scala:21)
>     at
> com.oreilly.learningsparkexamples.mini.scala.WordCount.main(WordCount.scala)
>     ... 6 more
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/java-io-FileNotFoundException-when-using-HDFS-in-cluster-mode-tp22287.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>