You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Rahul Palamuttam (JIRA)" <ji...@apache.org> on 2016/07/17 07:25:20 UTC

[jira] [Commented] (SPARK-13634) Assigning spark context to variable results in serialization error

    [ https://issues.apache.org/jira/browse/SPARK-13634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15381168#comment-15381168 ] 

Rahul Palamuttam commented on SPARK-13634:
------------------------------------------

Kai Chen, thank you. I apologize for not responding sooner. This does resolve our issue. 
As a little background :
We utilize a wrapper class for the SparkContext, and while I set the SparkContext variable inside the class to transient it didn't resolve our issue.
Instead attaching @transient tag to an instance of the wrapper class resolved the issue. 
Before :
val SciSc = new SciSparkContext(sc)
After
@transient SciSc = new SciSparkContext(sc)
We utilize the wrapper class SciSparkContext to delegate to functions like BinaryFiles to read file formats like netcdf while abstracting the extra details to actually read it in that format.

Sean Owen and Chris A. Mattmann - thank you for allowing the JIRA to be re-opened.
I would like to resolve the issue, but first I did wanted to point out that I didn't see much or any documentation on this issue. 
I was looking at the quick start here : http://spark.apache.org/docs/latest/quick-start.html#interactive-analysis-with-the-spark-shell
(I may have just missed it else where).
The spark-shell as a mode of interacting with spark seems to be becoming more common - especially with notebook projects like zeppelin (which we are using).
I do think that this is worth pointing out and mentioning - even if it is really an issue with scala.
If we are in agreement, I would like to change this JIRA to a documentation JIRA and submit the patch (I've never submitted a doc patch and it would be a nice experience for me).

I'll also respond sooner next time.




> Assigning spark context to variable results in serialization error
> ------------------------------------------------------------------
>
>                 Key: SPARK-13634
>                 URL: https://issues.apache.org/jira/browse/SPARK-13634
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Shell
>            Reporter: Rahul Palamuttam
>            Priority: Minor
>
> The following lines of code cause a task serialization error when executed in the spark-shell. 
> Note that the error does not occur when submitting the code as a batch job - via spark-submit.
> val temp = 10
> val newSC = sc
> val new RDD = newSC.parallelize(0 to 100).map(p => p + temp)
> For some reason when temp is being pulled in to the referencing environment of the closure, so is the SparkContext. 
> We originally hit this issue in the SciSpark project, when referencing a string variable inside of a lambda expression in RDD.map(...)
> Any insight into how this could be resolved would be appreciated.
> While the above code is trivial, SciSpark uses a wrapper around the SparkContext to read from various file formats. We want to keep this class structure and also use it in notebook and shell environments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org