You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Simon Scott <Si...@viavisolutions.com> on 2016/06/24 08:23:17 UTC

Associating user objects with SparkContext/SparkStreamingContext

Hi,

I am developing a streaming application using checkpointing on Spark 1.5.1

I have just run into a NotSerializableException because some of the state that my streaming functions need cannot be serialized. This state is only used in the driver process, it is the checkpointing that requires the serialization.

So I am considering moving that state into a Scala "object" - i.e. global singleton that must be mutable to allow the state to be set at application start.

I would prefer to be able to create immutable state and attach it to either the SparkContext or SparkStreamingContext but I can't find an api for that.

Does anybody else think is a good idea? Is there a better way? Or would such an api be a useful enhancement to Spark?

Thanks in advance
Simon

Research Developer
Viavi Solutions

RE: Associating user objects with SparkContext/SparkStreamingContext

Posted by Simon Scott <Si...@viavisolutions.com>.
“move the functions you are passing” yes this is what I had already done – and what I hope to avoid

Thank you however for the reminder about @transient – with that I am able to create a function value that includes the non-serializable state as a @transient val. Which at least packages the solution closer to the code that causes the problem.

Cheers
Simon

From: Evan Sparks [mailto:evan.sparks@gmail.com]
Sent: 24 June 2016 16:12
To: Simon Scott <Si...@viavisolutions.com>
Cc: dev@spark.apache.org
Subject: Re: Associating user objects with SparkContext/SparkStreamingContext

I would actually think about this the other way around. Move the functions you are passing to the streaming jobs out to their own object if possible. Spark's closure capture rules are necessarily far reaching and serialize the object that contains these methods, which is a common cause of the problem you're seeing.

Another option is to mark the non-serializable state as "@transient" if it is never accessed by the worker processes.

On Jun 24, 2016, at 1:23 AM, Simon Scott <Si...@viavisolutions.com>> wrote:
Hi,

I am developing a streaming application using checkpointing on Spark 1.5.1

I have just run into a NotSerializableException because some of the state that my streaming functions need cannot be serialized. This state is only used in the driver process, it is the checkpointing that requires the serialization.

So I am considering moving that state into a Scala “object” – i.e. global singleton that must be mutable to allow the state to be set at application start.

I would prefer to be able to create immutable state and attach it to either the SparkContext or SparkStreamingContext but I can’t find an api for that.

Does anybody else think is a good idea? Is there a better way? Or would such an api be a useful enhancement to Spark?

Thanks in advance
Simon

Research Developer
Viavi Solutions

Re: Associating user objects with SparkContext/SparkStreamingContext

Posted by Evan Sparks <ev...@gmail.com>.
I would actually think about this the other way around. Move the functions you are passing to the streaming jobs out to their own object if possible. Spark's closure capture rules are necessarily far reaching and serialize the object that contains these methods, which is a common cause of the problem you're seeing. 

Another option is to mark the non-serializable state as "@transient" if it is never accessed by the worker processes. 

> On Jun 24, 2016, at 1:23 AM, Simon Scott <Si...@viavisolutions.com> wrote:
> 
> Hi,
>  
> I am developing a streaming application using checkpointing on Spark 1.5.1
>  
> I have just run into a NotSerializableException because some of the state that my streaming functions need cannot be serialized. This state is only used in the driver process, it is the checkpointing that requires the serialization.
>  
> So I am considering moving that state into a Scala “object” – i.e. global singleton that must be mutable to allow the state to be set at application start.
>  
> I would prefer to be able to create immutable state and attach it to either the SparkContext or SparkStreamingContext but I can’t find an api for that.
>  
> Does anybody else think is a good idea? Is there a better way? Or would such an api be a useful enhancement to Spark?
>  
> Thanks in advance
> Simon
>  
> Research Developer
> Viavi Solutions