You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Younos Aboulnaga (JIRA)" <ji...@apache.org> on 2016/04/14 19:27:25 UTC

[jira] [Created] (SPARK-14638) Threads of Spark Streaming (with Kafka) looses sight of the executor classpath

Younos Aboulnaga created SPARK-14638:
----------------------------------------

             Summary: Threads of Spark Streaming (with Kafka) looses sight of the executor classpath
                 Key: SPARK-14638
                 URL: https://issues.apache.org/jira/browse/SPARK-14638
             Project: Spark
          Issue Type: Bug
          Components: Streaming
    Affects Versions: 1.6.1, 1.6.0, 1.4.1, 1.2.1
         Environment: uname -a: 
Linux HOSTNAME 3.13.0-74-generic #118-Ubuntu SMP Thu Dec 17 22:52:10 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

java -version:
java version "1.8.0_77"
Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
            Reporter: Younos Aboulnaga


I am pretty sure that the code being executed within a foreachRDD closure doesn't have access to the classes on the classpath, at least for a ReceiverInputDStream created using 'KafkaUtils.createStream'. I have been looking into this problem for a few days now, and I can comfortably claim that the Spark Streaming worker does not have have  access to the resources added by setting the 'spark.executor.extraClassPath' or adding jars to 'spark.jars'. Here is why:

1) Even though there is a 'log4j.properties' in the 'spark.executor.extraClassPath', the first line of the stderr of the worker says "Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties"

2) Even though the Jars in the worker directory contains a class, the job fails with a NoClassDefFoundError. Here is the specific example in my case:

> grep NoClassDef workers/app-20160414111328-0043/0/stderr

Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.protobuf.ProtobufUtil
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.protobuf.ProtobufUtil
.. SEVERAL ATTEMPTS ...
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hadoop.hbase.protobuf.ProtobufUtil 

Even though, in the same application worker dir:
> for j in workers/app-20160414111328-0043/0/*.jar ; do jar tf $j | grep ProtobufUtil ; done;

org/apache/hadoop/hbase/protobuf/ProtobufUtil$1.class
org/apache/hadoop/hbase/protobuf/ProtobufUtil.class

There are other examples, specially for configurations not being found. I think the SPARK-12279 can also be caused by  the same root cause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org