You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by "Rahul Palamuttam (JIRA)" <ji...@apache.org> on 2016/03/03 03:03:18 UTC

[jira] [Created] (ZEPPELIN-714) Assigning spark context to variable results in task not serializeable error

Rahul Palamuttam created ZEPPELIN-714:
-----------------------------------------

             Summary: Assigning spark context to variable results in task not serializeable error
                 Key: ZEPPELIN-714
                 URL: https://issues.apache.org/jira/browse/ZEPPELIN-714
             Project: Zeppelin
          Issue Type: Bug
          Components: zeppelin-interpreter
    Affects Versions: 0.5.6
         Environment: Scala and Apache Spark
            Reporter: Rahul Palamuttam


[~chrismattmann]

We recently observed the following issue with zeppelin:
assigning the spark context (sc) to a new variable and using that variable causes a Task Not Serializable exception. This error occurs with the spark-shell as well. However, submitting tasks via spark-submit with scala or java file doing the same operation does not incur the error.

Below are the three lines of code that will cause the error to happen in the zeppelin notebook.

val newSC = sc
val temp = 10
val rdd = newSC.parallelize(0 to 10).map(p => p + temp)

For some reason either sc or newSC is being included in the referencing environment of the closure. Note that if we replace "newSC.parallelize" to "sc.parallelize", the error goes away. 

We came across this when we tried to integrate SciSpark with Zeppelin.
SciSpark has its own SciSparkContext, which is just a wrapper around the SparkContext. We pass the SparkContext to the SciSparkContext via a base constructor. You can see the code for the class here :  https://github.com/SciSpark/SciSpark/blob/master/src/main/scala/org/dia/core/SciSparkContext.scala
The SciSparkContext does not extend Spark , it just has SparkContext as a member and uses the SparkContext to read various types of file formats into an RDD. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)