You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Chen Song <ch...@gmail.com> on 2015/07/09 18:04:46 UTC

Spark serialization in closure

I am not sure this is more of a question for Spark or just Scala but I am
posting my question here.

The code snippet below shows an example of passing a reference to a closure
in rdd.foreachPartition method.

```
object testing {
    object foo extends Serializable {
      val v = 42
    }
    val list = List(1,2,3)
    val rdd = sc.parallelize(list)
    def func = {
      val after = rdd.foreachPartition {
        it => println(foo.v)
      }
    }
  }
```
When running this code, I got an exception

```
Caused by: java.io.NotSerializableException:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
Serialization stack:
- object not serializable (class:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
- field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
- object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
<function1>)
```

It looks like Spark needs to serialize `testing` object. Why is it
serializing testing even though I only pass foo (another serializable
object) in the closure?

A more general question is, how can I prevent Spark from serializing the
parent class where RDD is defined, with still support of passing in
function defined in other classes?

-- 
Chen Song

Re: Spark serialization in closure

Posted by Chen Song <ch...@gmail.com>.
Thanks Andrew.

I tried with your suggestions and (2) works for me. (1) still doesn't work.

Chen

On Thu, Jul 9, 2015 at 4:58 PM, Andrew Or <an...@databricks.com> wrote:

> Hi Chen,
>
> I believe the issue is that `object foo` is a member of `object testing`,
> so the only way to access `object foo` is to first pull `object testing`
> into the closure, then access a pointer to get to `object foo`. There are
> two workarounds that I'm aware of:
>
> (1) Move `object foo` outside of `object testing`. This is only a problem
> because of the nested objects. Also, by design it's simpler to reason about
> but that's a separate discussion.
>
> (2) Create a local variable for `foo.v`. If all your closure cares about
> is the integer, then it makes sense to add a `val v = foo.v` inside `func`
> and use this in your closure instead. This avoids pulling in $outer
> pointers into your closure at all since it only references local variables.
>
> As others have commented, I think this is more of a Scala problem than a
> Spark one.
>
> Let me know if these work,
> -Andrew
>
> 2015-07-09 13:36 GMT-07:00 Richard Marscher <rm...@localytics.com>:
>
>> Reading that article and applying it to your observations of what happens
>> at runtime:
>>
>> shouldn't the closure require serializing testing? The foo singleton
>> object is a member of testing, and then you call this foo value in the
>> closure func and further in the foreachPartition closure. So following by
>> that article, Scala will attempt to serialize the containing object/class
>> testing to get the foo instance.
>>
>> On Thu, Jul 9, 2015 at 4:11 PM, Chen Song <ch...@gmail.com> wrote:
>>
>>> Repost the code example,
>>>
>>> object testing extends Serializable {
>>>     object foo {
>>>       val v = 42
>>>     }
>>>     val list = List(1,2,3)
>>>     val rdd = sc.parallelize(list)
>>>     def func = {
>>>       val after = rdd.foreachPartition {
>>>         it => println(foo.v)
>>>       }
>>>     }
>>>   }
>>>
>>> On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <ch...@gmail.com>
>>> wrote:
>>>
>>>> Thanks Erik. I saw the document too. That is why I am confused because
>>>> as per the article, it should be good as long as *foo *is
>>>> serializable. However, what I have seen is that it would work if
>>>> *testing* is serializable, even foo is not serializable, as shown
>>>> below. I don't know if there is something specific to Spark.
>>>>
>>>> For example, the code example below works.
>>>>
>>>> object testing extends Serializable {
>>>>
>>>>     object foo {
>>>>
>>>>       val v = 42
>>>>
>>>>     }
>>>>
>>>>     val list = List(1,2,3)
>>>>
>>>>     val rdd = sc.parallelize(list)
>>>>
>>>>     def func = {
>>>>
>>>>       val after = rdd.foreachPartition {
>>>>
>>>>         it => println(foo.v)
>>>>
>>>>       }
>>>>
>>>>     }
>>>>
>>>>   }
>>>>
>>>> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <ej...@redhat.com> wrote:
>>>>
>>>>> I think you have stumbled across this idiosyncrasy:
>>>>>
>>>>>
>>>>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ----- Original Message -----
>>>>> > I am not sure this is more of a question for Spark or just Scala but
>>>>> I am
>>>>> > posting my question here.
>>>>> >
>>>>> > The code snippet below shows an example of passing a reference to a
>>>>> closure
>>>>> > in rdd.foreachPartition method.
>>>>> >
>>>>> > ```
>>>>> > object testing {
>>>>> >     object foo extends Serializable {
>>>>> >       val v = 42
>>>>> >     }
>>>>> >     val list = List(1,2,3)
>>>>> >     val rdd = sc.parallelize(list)
>>>>> >     def func = {
>>>>> >       val after = rdd.foreachPartition {
>>>>> >         it => println(foo.v)
>>>>> >       }
>>>>> >     }
>>>>> >   }
>>>>> > ```
>>>>> > When running this code, I got an exception
>>>>> >
>>>>> > ```
>>>>> > Caused by: java.io.NotSerializableException:
>>>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
>>>>> > Serialization stack:
>>>>> > - object not serializable (class:
>>>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
>>>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
>>>>> > - field (class:
>>>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>>>> > name: $outer, type: class
>>>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
>>>>> > - object (class
>>>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>>>> > <function1>)
>>>>> > ```
>>>>> >
>>>>> > It looks like Spark needs to serialize `testing` object. Why is it
>>>>> > serializing testing even though I only pass foo (another serializable
>>>>> > object) in the closure?
>>>>> >
>>>>> > A more general question is, how can I prevent Spark from serializing
>>>>> the
>>>>> > parent class where RDD is defined, with still support of passing in
>>>>> > function defined in other classes?
>>>>> >
>>>>> > --
>>>>> > Chen Song
>>>>> >
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Chen Song
>>>>
>>>>
>>>
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>>
>> --
>> --
>> *Richard Marscher*
>> Software Engineer
>> Localytics
>> Localytics.com <http://localytics.com/> | Our Blog
>> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
>> Facebook <http://facebook.com/localytics> | LinkedIn
>> <http://www.linkedin.com/company/1148792?trk=tyah>
>>
>
>


-- 
Chen Song

Re: Spark serialization in closure

Posted by Andrew Or <an...@databricks.com>.
Hi Chen,

I believe the issue is that `object foo` is a member of `object testing`,
so the only way to access `object foo` is to first pull `object testing`
into the closure, then access a pointer to get to `object foo`. There are
two workarounds that I'm aware of:

(1) Move `object foo` outside of `object testing`. This is only a problem
because of the nested objects. Also, by design it's simpler to reason about
but that's a separate discussion.

(2) Create a local variable for `foo.v`. If all your closure cares about is
the integer, then it makes sense to add a `val v = foo.v` inside `func` and
use this in your closure instead. This avoids pulling in $outer pointers
into your closure at all since it only references local variables.

As others have commented, I think this is more of a Scala problem than a
Spark one.

Let me know if these work,
-Andrew

2015-07-09 13:36 GMT-07:00 Richard Marscher <rm...@localytics.com>:

> Reading that article and applying it to your observations of what happens
> at runtime:
>
> shouldn't the closure require serializing testing? The foo singleton
> object is a member of testing, and then you call this foo value in the
> closure func and further in the foreachPartition closure. So following by
> that article, Scala will attempt to serialize the containing object/class
> testing to get the foo instance.
>
> On Thu, Jul 9, 2015 at 4:11 PM, Chen Song <ch...@gmail.com> wrote:
>
>> Repost the code example,
>>
>> object testing extends Serializable {
>>     object foo {
>>       val v = 42
>>     }
>>     val list = List(1,2,3)
>>     val rdd = sc.parallelize(list)
>>     def func = {
>>       val after = rdd.foreachPartition {
>>         it => println(foo.v)
>>       }
>>     }
>>   }
>>
>> On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <ch...@gmail.com> wrote:
>>
>>> Thanks Erik. I saw the document too. That is why I am confused because
>>> as per the article, it should be good as long as *foo *is serializable.
>>> However, what I have seen is that it would work if *testing* is
>>> serializable, even foo is not serializable, as shown below. I don't know if
>>> there is something specific to Spark.
>>>
>>> For example, the code example below works.
>>>
>>> object testing extends Serializable {
>>>
>>>     object foo {
>>>
>>>       val v = 42
>>>
>>>     }
>>>
>>>     val list = List(1,2,3)
>>>
>>>     val rdd = sc.parallelize(list)
>>>
>>>     def func = {
>>>
>>>       val after = rdd.foreachPartition {
>>>
>>>         it => println(foo.v)
>>>
>>>       }
>>>
>>>     }
>>>
>>>   }
>>>
>>> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <ej...@redhat.com> wrote:
>>>
>>>> I think you have stumbled across this idiosyncrasy:
>>>>
>>>>
>>>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>>>>
>>>>
>>>>
>>>>
>>>> ----- Original Message -----
>>>> > I am not sure this is more of a question for Spark or just Scala but
>>>> I am
>>>> > posting my question here.
>>>> >
>>>> > The code snippet below shows an example of passing a reference to a
>>>> closure
>>>> > in rdd.foreachPartition method.
>>>> >
>>>> > ```
>>>> > object testing {
>>>> >     object foo extends Serializable {
>>>> >       val v = 42
>>>> >     }
>>>> >     val list = List(1,2,3)
>>>> >     val rdd = sc.parallelize(list)
>>>> >     def func = {
>>>> >       val after = rdd.foreachPartition {
>>>> >         it => println(foo.v)
>>>> >       }
>>>> >     }
>>>> >   }
>>>> > ```
>>>> > When running this code, I got an exception
>>>> >
>>>> > ```
>>>> > Caused by: java.io.NotSerializableException:
>>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
>>>> > Serialization stack:
>>>> > - object not serializable (class:
>>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
>>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
>>>> > - field (class:
>>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>>> > name: $outer, type: class
>>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
>>>> > - object (class
>>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>>> > <function1>)
>>>> > ```
>>>> >
>>>> > It looks like Spark needs to serialize `testing` object. Why is it
>>>> > serializing testing even though I only pass foo (another serializable
>>>> > object) in the closure?
>>>> >
>>>> > A more general question is, how can I prevent Spark from serializing
>>>> the
>>>> > parent class where RDD is defined, with still support of passing in
>>>> > function defined in other classes?
>>>> >
>>>> > --
>>>> > Chen Song
>>>> >
>>>>
>>>
>>>
>>>
>>> --
>>> Chen Song
>>>
>>>
>>
>>
>> --
>> Chen Song
>>
>>
>
>
> --
> --
> *Richard Marscher*
> Software Engineer
> Localytics
> Localytics.com <http://localytics.com/> | Our Blog
> <http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
> Facebook <http://facebook.com/localytics> | LinkedIn
> <http://www.linkedin.com/company/1148792?trk=tyah>
>

Re: Spark serialization in closure

Posted by Richard Marscher <rm...@localytics.com>.
Reading that article and applying it to your observations of what happens
at runtime:

shouldn't the closure require serializing testing? The foo singleton object
is a member of testing, and then you call this foo value in the closure
func and further in the foreachPartition closure. So following by that
article, Scala will attempt to serialize the containing object/class
testing to get the foo instance.

On Thu, Jul 9, 2015 at 4:11 PM, Chen Song <ch...@gmail.com> wrote:

> Repost the code example,
>
> object testing extends Serializable {
>     object foo {
>       val v = 42
>     }
>     val list = List(1,2,3)
>     val rdd = sc.parallelize(list)
>     def func = {
>       val after = rdd.foreachPartition {
>         it => println(foo.v)
>       }
>     }
>   }
>
> On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <ch...@gmail.com> wrote:
>
>> Thanks Erik. I saw the document too. That is why I am confused because as
>> per the article, it should be good as long as *foo *is serializable.
>> However, what I have seen is that it would work if *testing* is
>> serializable, even foo is not serializable, as shown below. I don't know if
>> there is something specific to Spark.
>>
>> For example, the code example below works.
>>
>> object testing extends Serializable {
>>
>>     object foo {
>>
>>       val v = 42
>>
>>     }
>>
>>     val list = List(1,2,3)
>>
>>     val rdd = sc.parallelize(list)
>>
>>     def func = {
>>
>>       val after = rdd.foreachPartition {
>>
>>         it => println(foo.v)
>>
>>       }
>>
>>     }
>>
>>   }
>>
>> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <ej...@redhat.com> wrote:
>>
>>> I think you have stumbled across this idiosyncrasy:
>>>
>>>
>>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>>>
>>>
>>>
>>>
>>> ----- Original Message -----
>>> > I am not sure this is more of a question for Spark or just Scala but I
>>> am
>>> > posting my question here.
>>> >
>>> > The code snippet below shows an example of passing a reference to a
>>> closure
>>> > in rdd.foreachPartition method.
>>> >
>>> > ```
>>> > object testing {
>>> >     object foo extends Serializable {
>>> >       val v = 42
>>> >     }
>>> >     val list = List(1,2,3)
>>> >     val rdd = sc.parallelize(list)
>>> >     def func = {
>>> >       val after = rdd.foreachPartition {
>>> >         it => println(foo.v)
>>> >       }
>>> >     }
>>> >   }
>>> > ```
>>> > When running this code, I got an exception
>>> >
>>> > ```
>>> > Caused by: java.io.NotSerializableException:
>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
>>> > Serialization stack:
>>> > - object not serializable (class:
>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
>>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
>>> > - field (class:
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>> > name: $outer, type: class
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
>>> > - object (class
>>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>>> > <function1>)
>>> > ```
>>> >
>>> > It looks like Spark needs to serialize `testing` object. Why is it
>>> > serializing testing even though I only pass foo (another serializable
>>> > object) in the closure?
>>> >
>>> > A more general question is, how can I prevent Spark from serializing
>>> the
>>> > parent class where RDD is defined, with still support of passing in
>>> > function defined in other classes?
>>> >
>>> > --
>>> > Chen Song
>>> >
>>>
>>
>>
>>
>> --
>> Chen Song
>>
>>
>
>
> --
> Chen Song
>
>


-- 
-- 
*Richard Marscher*
Software Engineer
Localytics
Localytics.com <http://localytics.com/> | Our Blog
<http://localytics.com/blog> | Twitter <http://twitter.com/localytics> |
Facebook <http://facebook.com/localytics> | LinkedIn
<http://www.linkedin.com/company/1148792?trk=tyah>

Re: Spark serialization in closure

Posted by Chen Song <ch...@gmail.com>.
Repost the code example,

object testing extends Serializable {
    object foo {
      val v = 42
    }
    val list = List(1,2,3)
    val rdd = sc.parallelize(list)
    def func = {
      val after = rdd.foreachPartition {
        it => println(foo.v)
      }
    }
  }

On Thu, Jul 9, 2015 at 4:09 PM, Chen Song <ch...@gmail.com> wrote:

> Thanks Erik. I saw the document too. That is why I am confused because as
> per the article, it should be good as long as *foo *is serializable.
> However, what I have seen is that it would work if *testing* is
> serializable, even foo is not serializable, as shown below. I don't know if
> there is something specific to Spark.
>
> For example, the code example below works.
>
> object testing extends Serializable {
>
>     object foo {
>
>       val v = 42
>
>     }
>
>     val list = List(1,2,3)
>
>     val rdd = sc.parallelize(list)
>
>     def func = {
>
>       val after = rdd.foreachPartition {
>
>         it => println(foo.v)
>
>       }
>
>     }
>
>   }
>
> On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <ej...@redhat.com> wrote:
>
>> I think you have stumbled across this idiosyncrasy:
>>
>>
>> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>>
>>
>>
>>
>> ----- Original Message -----
>> > I am not sure this is more of a question for Spark or just Scala but I
>> am
>> > posting my question here.
>> >
>> > The code snippet below shows an example of passing a reference to a
>> closure
>> > in rdd.foreachPartition method.
>> >
>> > ```
>> > object testing {
>> >     object foo extends Serializable {
>> >       val v = 42
>> >     }
>> >     val list = List(1,2,3)
>> >     val rdd = sc.parallelize(list)
>> >     def func = {
>> >       val after = rdd.foreachPartition {
>> >         it => println(foo.v)
>> >       }
>> >     }
>> >   }
>> > ```
>> > When running this code, I got an exception
>> >
>> > ```
>> > Caused by: java.io.NotSerializableException:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
>> > Serialization stack:
>> > - object not serializable (class:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
>> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
>> > - field (class:
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>> > name: $outer, type: class
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
>> > - object (class
>> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
>> > <function1>)
>> > ```
>> >
>> > It looks like Spark needs to serialize `testing` object. Why is it
>> > serializing testing even though I only pass foo (another serializable
>> > object) in the closure?
>> >
>> > A more general question is, how can I prevent Spark from serializing the
>> > parent class where RDD is defined, with still support of passing in
>> > function defined in other classes?
>> >
>> > --
>> > Chen Song
>> >
>>
>
>
>
> --
> Chen Song
>
>


-- 
Chen Song

Re: Spark serialization in closure

Posted by Chen Song <ch...@gmail.com>.
Thanks Erik. I saw the document too. That is why I am confused because as
per the article, it should be good as long as *foo *is serializable.
However, what I have seen is that it would work if *testing* is
serializable, even foo is not serializable, as shown below. I don't know if
there is something specific to Spark.

For example, the code example below works.

object testing extends Serializable {

    object foo {

      val v = 42

    }

    val list = List(1,2,3)

    val rdd = sc.parallelize(list)

    def func = {

      val after = rdd.foreachPartition {

        it => println(foo.v)

      }

    }

  }

On Thu, Jul 9, 2015 at 12:11 PM, Erik Erlandson <ej...@redhat.com> wrote:

> I think you have stumbled across this idiosyncrasy:
>
>
> http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/
>
>
>
>
> ----- Original Message -----
> > I am not sure this is more of a question for Spark or just Scala but I am
> > posting my question here.
> >
> > The code snippet below shows an example of passing a reference to a
> closure
> > in rdd.foreachPartition method.
> >
> > ```
> > object testing {
> >     object foo extends Serializable {
> >       val v = 42
> >     }
> >     val list = List(1,2,3)
> >     val rdd = sc.parallelize(list)
> >     def func = {
> >       val after = rdd.foreachPartition {
> >         it => println(foo.v)
> >       }
> >     }
> >   }
> > ```
> > When running this code, I got an exception
> >
> > ```
> > Caused by: java.io.NotSerializableException:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
> > Serialization stack:
> > - object not serializable (class:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
> > $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
> > - field (class:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> > name: $outer, type: class
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
> > - object (class
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> > <function1>)
> > ```
> >
> > It looks like Spark needs to serialize `testing` object. Why is it
> > serializing testing even though I only pass foo (another serializable
> > object) in the closure?
> >
> > A more general question is, how can I prevent Spark from serializing the
> > parent class where RDD is defined, with still support of passing in
> > function defined in other classes?
> >
> > --
> > Chen Song
> >
>



-- 
Chen Song

Re: Spark serialization in closure

Posted by Erik Erlandson <ej...@redhat.com>.
I think you have stumbled across this idiosyncrasy:

http://erikerlandson.github.io/blog/2015/03/31/hygienic-closures-for-scala-function-serialization/




----- Original Message -----
> I am not sure this is more of a question for Spark or just Scala but I am
> posting my question here.
> 
> The code snippet below shows an example of passing a reference to a closure
> in rdd.foreachPartition method.
> 
> ```
> object testing {
>     object foo extends Serializable {
>       val v = 42
>     }
>     val list = List(1,2,3)
>     val rdd = sc.parallelize(list)
>     def func = {
>       val after = rdd.foreachPartition {
>         it => println(foo.v)
>       }
>     }
>   }
> ```
> When running this code, I got an exception
> 
> ```
> Caused by: java.io.NotSerializableException:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$
> Serialization stack:
> - object not serializable (class:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$, value:
> $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$@10b7e824)
> - field (class: $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> name: $outer, type: class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$)
> - object (class $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$testing$$anonfun$1,
> <function1>)
> ```
> 
> It looks like Spark needs to serialize `testing` object. Why is it
> serializing testing even though I only pass foo (another serializable
> object) in the closure?
> 
> A more general question is, how can I prevent Spark from serializing the
> parent class where RDD is defined, with still support of passing in
> function defined in other classes?
> 
> --
> Chen Song
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org