You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Paolo Platter <pa...@agilelab.it> on 2014/10/27 20:39:49 UTC

Spark Shell strange worker Exception

Hi all,

I’m submitting a simple task using the spark shell against a cassandraRDD ( Datastax Environment ).
I’m getting the following eception from one of the workers:


INFO 2014-10-27 14:08:03 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
INFO 2014-10-27 14:08:03 Remoting: Starting remoting
INFO 2014-10-27 14:08:03 Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@10.105.111.130:50234]
INFO 2014-10-27 14:08:03 Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@10.105.111.130:50234]
INFO 2014-10-27 14:08:03 org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@srv02.pocbgsia.ats-online.it:39797/user/CoarseGrainedScheduler
INFO 2014-10-27 14:08:03 org.apache.spark.deploy.worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
INFO 2014-10-27 14:08:04 org.apache.spark.deploy.worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
INFO 2014-10-27 14:08:04 org.apache.spark.executor.CoarseGrainedExecutorBackend: Successfully registered with driver
INFO 2014-10-27 14:08:04 org.apache.spark.executor.Executor: Using REPL class URI: http://159.8.18.11:51705
INFO 2014-10-27 14:08:04 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
INFO 2014-10-27 14:08:04 Remoting: Starting remoting
INFO 2014-10-27 14:08:04 Remoting: Remoting started; listening on addresses :[akka.tcp://spark@10.105.111.130:49243]
INFO 2014-10-27 14:08:04 Remoting: Remoting now listens on addresses: [akka.tcp://spark@10.105.111.130:49243]
INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@srv02.pocbgsia.ats-online.it:39797/user/BlockManagerMaster
INFO 2014-10-27 14:08:04 org.apache.spark.storage.DiskBlockManager: Created local directory at /usr/share/dse/spark/tmp/executor/spark-local-20141027140804-4d84
INFO 2014-10-27 14:08:04 org.apache.spark.storage.MemoryStore: MemoryStore started with capacity 23.0 GB.
INFO 2014-10-27 14:08:04 org.apache.spark.network.ConnectionManager: Bound socket to port 50542 with id = ConnectionManagerId(10.105.111.130,50542)
INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Trying to register BlockManager
INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Registered BlockManager
INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@srv02.pocbgsia.ats-online.it:39797/user/MapOutputTracker
INFO 2014-10-27 14:08:04 org.apache.spark.HttpFileServer: HTTP File server directory is /usr/share/dse/spark/tmp/executor/spark-a23656dc-efce-494b-875a-a1cf092c3230
INFO 2014-10-27 14:08:04 org.apache.spark.HttpServer: Starting HTTP Server
INFO 2014-10-27 14:08:27 org.apache.spark.executor.CoarseGrainedExecutorBackend: Got assigned task 0
INFO 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Running task ID 0
ERROR 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Exception in task ID 0
java.lang.ClassNotFoundException: com.datastax.bdp.spark.CassandraRDD
        at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:49)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.ClassLoader.loadClass(Unknown Source)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Unknown Source)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
        at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
        at java.io.ObjectInputStream.readClassDesc(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
        at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
        at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
        at java.io.ObjectInputStream.readExternalData(Unknown Source)
        at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
        at java.io.ObjectInputStream.readObject0(Unknown Source)
        at java.io.ObjectInputStream.readObject(Unknown Source)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
        at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.io.FileNotFoundException: http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class
        at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
        at java.net.URL.openStream(Unknown Source)
        at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:55)
        ... 25 more

I don’t understand why a worker (private address: 10.105.111.130  srv02.pocbgsia.ats-online.it ) search a .class file on a public url of the master node (http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class)

What I’m missing ?

Thanks in advance

Paolo​


Re: Spark Shell strange worker Exception

Posted by Saket Kumar <sa...@bgch.co.uk>.
Hi Paolo,

The custom classes and jars are distributed across the Spark cluster via an HTTP server on the master when the absolute path of the application fat jar is specified in the spark-submit script. The Advanced Dependency Management section on https://spark.apache.org/docs/latest/submitting-applications.html explains that.

Could that be the reason for the worker access the master? However I don’t know the cause of the error.

Thanks,
Saket 


On 27 Oct 2014, at 19:39, Paolo Platter <pa...@agilelab.it> wrote:

> Hi all,
> 
> I’m submitting a simple task using the spark shell against a cassandraRDD ( Datastax Environment ).
> I’m getting the following eception from one of the workers:
> 
> INFO 2014-10-27 14:08:03 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
> INFO 2014-10-27 14:08:03 Remoting: Starting remoting
> INFO 2014-10-27 14:08:03 Remoting: Remoting started; listening on addresses :[akka.tcp://sparkExecutor@10.105.111.130:50234]
> INFO 2014-10-27 14:08:03 Remoting: Remoting now listens on addresses: [akka.tcp://sparkExecutor@10.105.111.130:50234]
> INFO 2014-10-27 14:08:03 org.apache.spark.executor.CoarseGrainedExecutorBackend: Connecting to driver: akka.tcp://spark@srv02.pocbgsia.ats-online.it:39797/user/CoarseGrainedScheduler
> INFO 2014-10-27 14:08:03 org.apache.spark.deploy.worker.WorkerWatcher: Connecting to worker akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
> INFO 2014-10-27 14:08:04 org.apache.spark.deploy.worker.WorkerWatcher: Successfully connected to akka.tcp://sparkWorker@10.105.111.130:34467/user/Worker
> INFO 2014-10-27 14:08:04 org.apache.spark.executor.CoarseGrainedExecutorBackend: Successfully registered with driver
> INFO 2014-10-27 14:08:04 org.apache.spark.executor.Executor: Using REPL class URI: http://159.8.18.11:51705
> INFO 2014-10-27 14:08:04 akka.event.slf4j.Slf4jLogger: Slf4jLogger started
> INFO 2014-10-27 14:08:04 Remoting: Starting remoting
> INFO 2014-10-27 14:08:04 Remoting: Remoting started; listening on addresses :[akka.tcp://spark@10.105.111.130:49243]
> INFO 2014-10-27 14:08:04 Remoting: Remoting now listens on addresses: [akka.tcp://spark@10.105.111.130:49243]
> INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to BlockManagerMaster: akka.tcp://spark@srv02.pocbgsia.ats-online.it:39797/user/BlockManagerMaster
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.DiskBlockManager: Created local directory at /usr/share/dse/spark/tmp/executor/spark-local-20141027140804-4d84
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.MemoryStore: MemoryStore started with capacity 23.0 GB.
> INFO 2014-10-27 14:08:04 org.apache.spark.network.ConnectionManager: Bound socket to port 50542 with id = ConnectionManagerId(10.105.111.130,50542)
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Trying to register BlockManager
> INFO 2014-10-27 14:08:04 org.apache.spark.storage.BlockManagerMaster: Registered BlockManager
> INFO 2014-10-27 14:08:04 org.apache.spark.SparkEnv: Connecting to MapOutputTracker: akka.tcp://spark@srv02.pocbgsia.ats-online.it:39797/user/MapOutputTracker
> INFO 2014-10-27 14:08:04 org.apache.spark.HttpFileServer: HTTP File server directory is /usr/share/dse/spark/tmp/executor/spark-a23656dc-efce-494b-875a-a1cf092c3230
> INFO 2014-10-27 14:08:04 org.apache.spark.HttpServer: Starting HTTP Server
> INFO 2014-10-27 14:08:27 org.apache.spark.executor.CoarseGrainedExecutorBackend: Got assigned task 0
> INFO 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Running task ID 0
> ERROR 2014-10-27 14:08:28 org.apache.spark.executor.Executor: Exception in task ID 0
> java.lang.ClassNotFoundException: com.datastax.bdp.spark.CassandraRDD
> 	at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:49)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at java.lang.ClassLoader.loadClass(Unknown Source)
> 	at java.lang.Class.forName0(Native Method)
> 	at java.lang.Class.forName(Unknown Source)
> 	at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:37)
> 	at java.io.ObjectInputStream.readNonProxyDesc(Unknown Source)
> 	at java.io.ObjectInputStream.readClassDesc(Unknown Source)
> 	at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> 	at org.apache.spark.scheduler.ResultTask$.deserializeInfo(ResultTask.scala:63)
> 	at org.apache.spark.scheduler.ResultTask.readExternal(ResultTask.scala:139)
> 	at java.io.ObjectInputStream.readExternalData(Unknown Source)
> 	at java.io.ObjectInputStream.readOrdinaryObject(Unknown Source)
> 	at java.io.ObjectInputStream.readObject0(Unknown Source)
> 	at java.io.ObjectInputStream.readObject(Unknown Source)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:40)
> 	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:62)
> 	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:193)
> 	at org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:45)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:176)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> 	at java.lang.Thread.run(Unknown Source)
> Caused by: java.io.FileNotFoundException: http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class
> 	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
> 	at java.net.URL.openStream(Unknown Source)
> 	at org.apache.spark.repl.ExecutorClassLoader.findClass(ExecutorClassLoader.scala:55)
> 	... 25 more
> 
> I don’t understand why a worker (private address: 10.105.111.130  srv02.pocbgsia.ats-online.it ) search a .class file on a public url of the master node (http://159.8.18.11:51705/com/datastax/bdp/spark/CassandraRDD.class)
> 
> What I’m missing ?
> 
> Thanks in advance
> 
> Paolo​