You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by adelbertc <ad...@gmail.com> on 2015/03/23 20:03:48 UTC

Getting around Serializability issues for types not in my control

Hey all,

I'd like to use the Scalaz library in some of my Spark jobs, but am running
into issues where some stuff I use from Scalaz is not serializable. For
instance, in Scalaz there is a trait

/** In Scalaz */
trait Applicative[F[_]] {
  def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
  def point[A](a: => A): F[A]
}

But when I try to use it in say, in an `RDD#aggregate` call I get:


Caused by: java.io.NotSerializableException:
scalaz.std.OptionInstances$$anon$1
Serialization stack:
	- object not serializable (class: scalaz.std.OptionInstances$$anon$1,
value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
	- field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1, type:
interface scalaz.Applicative)
	- object (class dielectric.syntax.RDDOps$$anonfun$1, <function2>)
	- field (class: dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
name: apConcat$1, type: interface scala.Function2)
	- object (class dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
<function2>)

Outside of submitting a PR to Scalaz to make things Serializable, what can I
do to make things Serializable? I considered something like

implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
SomeSerializableType[F] =
  new SomeSerializableType { ... } ??

Not sure how to go about doing it - I looked at java.io.Externalizable but
given `scalaz.Applicative` has no value members I'm not sure how to
implement the interface.

Any guidance would be much appreciated - thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Getting around Serializability issues for types not in my control

Posted by Adelbert Chang <ad...@gmail.com>.
Instantiating the instance? The actual instance it's complaining about is:

https://github.com/scalaz/scalaz/blob/16838556c9309225013f917e577072476f46dc14/core/src/main/scala/scalaz/std/Option.scala#L10-11

The specific import where it's picking up the instance is:

https://github.com/scalaz/scalaz/blob/16838556c9309225013f917e577072476f46dc14/core/src/main/scala/scalaz/std/Option.scala#L227


Note the object extends OptionInstances which contains that instance.

Is the suggestion to pass in something like new OptionInstances { } into
the RDD#aggregate call?

On Mon, Mar 23, 2015 at 1:09 PM, Cody Koeninger <co...@koeninger.org> wrote:

> Have you tried instantiating the instance inside the closure, rather than
> outside of it?
>
> If that works, you may need to switch to use mapPartition /
> foreachPartition for efficiency reasons.
>
>
> On Mon, Mar 23, 2015 at 3:03 PM, Adelbert Chang <ad...@gmail.com>
> wrote:
>
>> Is there no way to pull out the bits of the instance I want before I sent
>> it through the closure for aggregate? I did try pulling things out, along
>> the lines of
>>
>> def foo[G[_], B](blah: Blah)(implicit G: Applicative[G]) = {
>>   val lift: B => G[RDD[B]] = b =>
>> G.point(sparkContext.parallelize(List(b)))
>>
>>   rdd.aggregate(/* use lift in here */)
>> }
>>
>> But that doesn't seem to work either, still seems to be trying to
>> serialize the Applicative... :(
>>
>> On Mon, Mar 23, 2015 at 12:27 PM, Dean Wampler <de...@gmail.com>
>> wrote:
>>
>>> Well, it's complaining about trait OptionInstances which is defined in
>>> Option.scala in the std package. Use scalap or javap on the scalaz library
>>> to find out which member of the trait is the problem, but since it says
>>> "$$anon$1", I suspect it's the first value member, "implicit val
>>> optionInstance", which has a long list of mixin traits, one of which is
>>> probably at fault. OptionInstances is huge, so there might be other
>>> offenders.
>>>
>>> Scalaz wasn't designed for distributed systems like this, so you'll
>>> probably find many examples of nonserializability. An alternative is to
>>> avoid using Scalaz in any closures passed to Spark methods, but that's
>>> probably not what you want.
>>>
>>> dean
>>>
>>> Dean Wampler, Ph.D.
>>> Author: Programming Scala, 2nd Edition
>>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>>> Typesafe <http://typesafe.com>
>>> @deanwampler <http://twitter.com/deanwampler>
>>> http://polyglotprogramming.com
>>>
>>> On Mon, Mar 23, 2015 at 12:03 PM, adelbertc <ad...@gmail.com> wrote:
>>>
>>>> Hey all,
>>>>
>>>> I'd like to use the Scalaz library in some of my Spark jobs, but am
>>>> running
>>>> into issues where some stuff I use from Scalaz is not serializable. For
>>>> instance, in Scalaz there is a trait
>>>>
>>>> /** In Scalaz */
>>>> trait Applicative[F[_]] {
>>>>   def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
>>>>   def point[A](a: => A): F[A]
>>>> }
>>>>
>>>> But when I try to use it in say, in an `RDD#aggregate` call I get:
>>>>
>>>>
>>>> Caused by: java.io.NotSerializableException:
>>>> scalaz.std.OptionInstances$$anon$1
>>>> Serialization stack:
>>>>         - object not serializable (class:
>>>> scalaz.std.OptionInstances$$anon$1,
>>>> value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
>>>>         - field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1,
>>>> type:
>>>> interface scalaz.Applicative)
>>>>         - object (class dielectric.syntax.RDDOps$$anonfun$1,
>>>> <function2>)
>>>>         - field (class:
>>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>>> name: apConcat$1, type: interface scala.Function2)
>>>>         - object (class
>>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>>> <function2>)
>>>>
>>>> Outside of submitting a PR to Scalaz to make things Serializable, what
>>>> can I
>>>> do to make things Serializable? I considered something like
>>>>
>>>> implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
>>>> SomeSerializableType[F] =
>>>>   new SomeSerializableType { ... } ??
>>>>
>>>> Not sure how to go about doing it - I looked at java.io.Externalizable
>>>> but
>>>> given `scalaz.Applicative` has no value members I'm not sure how to
>>>> implement the interface.
>>>>
>>>> Any guidance would be much appreciated - thanks!
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>> Adelbert (Allen) Chang
>>
>
>


-- 
Adelbert (Allen) Chang

Re: Getting around Serializability issues for types not in my control

Posted by Cody Koeninger <co...@koeninger.org>.
Have you tried instantiating the instance inside the closure, rather than
outside of it?

If that works, you may need to switch to use mapPartition /
foreachPartition for efficiency reasons.


On Mon, Mar 23, 2015 at 3:03 PM, Adelbert Chang <ad...@gmail.com> wrote:

> Is there no way to pull out the bits of the instance I want before I sent
> it through the closure for aggregate? I did try pulling things out, along
> the lines of
>
> def foo[G[_], B](blah: Blah)(implicit G: Applicative[G]) = {
>   val lift: B => G[RDD[B]] = b =>
> G.point(sparkContext.parallelize(List(b)))
>
>   rdd.aggregate(/* use lift in here */)
> }
>
> But that doesn't seem to work either, still seems to be trying to
> serialize the Applicative... :(
>
> On Mon, Mar 23, 2015 at 12:27 PM, Dean Wampler <de...@gmail.com>
> wrote:
>
>> Well, it's complaining about trait OptionInstances which is defined in
>> Option.scala in the std package. Use scalap or javap on the scalaz library
>> to find out which member of the trait is the problem, but since it says
>> "$$anon$1", I suspect it's the first value member, "implicit val
>> optionInstance", which has a long list of mixin traits, one of which is
>> probably at fault. OptionInstances is huge, so there might be other
>> offenders.
>>
>> Scalaz wasn't designed for distributed systems like this, so you'll
>> probably find many examples of nonserializability. An alternative is to
>> avoid using Scalaz in any closures passed to Spark methods, but that's
>> probably not what you want.
>>
>> dean
>>
>> Dean Wampler, Ph.D.
>> Author: Programming Scala, 2nd Edition
>> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
>> Typesafe <http://typesafe.com>
>> @deanwampler <http://twitter.com/deanwampler>
>> http://polyglotprogramming.com
>>
>> On Mon, Mar 23, 2015 at 12:03 PM, adelbertc <ad...@gmail.com> wrote:
>>
>>> Hey all,
>>>
>>> I'd like to use the Scalaz library in some of my Spark jobs, but am
>>> running
>>> into issues where some stuff I use from Scalaz is not serializable. For
>>> instance, in Scalaz there is a trait
>>>
>>> /** In Scalaz */
>>> trait Applicative[F[_]] {
>>>   def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
>>>   def point[A](a: => A): F[A]
>>> }
>>>
>>> But when I try to use it in say, in an `RDD#aggregate` call I get:
>>>
>>>
>>> Caused by: java.io.NotSerializableException:
>>> scalaz.std.OptionInstances$$anon$1
>>> Serialization stack:
>>>         - object not serializable (class:
>>> scalaz.std.OptionInstances$$anon$1,
>>> value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
>>>         - field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1,
>>> type:
>>> interface scalaz.Applicative)
>>>         - object (class dielectric.syntax.RDDOps$$anonfun$1, <function2>)
>>>         - field (class:
>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>> name: apConcat$1, type: interface scala.Function2)
>>>         - object (class
>>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>>> <function2>)
>>>
>>> Outside of submitting a PR to Scalaz to make things Serializable, what
>>> can I
>>> do to make things Serializable? I considered something like
>>>
>>> implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
>>> SomeSerializableType[F] =
>>>   new SomeSerializableType { ... } ??
>>>
>>> Not sure how to go about doing it - I looked at java.io.Externalizable
>>> but
>>> given `scalaz.Applicative` has no value members I'm not sure how to
>>> implement the interface.
>>>
>>> Any guidance would be much appreciated - thanks!
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> Adelbert (Allen) Chang
>

Re: Getting around Serializability issues for types not in my control

Posted by Adelbert Chang <ad...@gmail.com>.
Is there no way to pull out the bits of the instance I want before I sent
it through the closure for aggregate? I did try pulling things out, along
the lines of

def foo[G[_], B](blah: Blah)(implicit G: Applicative[G]) = {
  val lift: B => G[RDD[B]] = b => G.point(sparkContext.parallelize(List(b)))

  rdd.aggregate(/* use lift in here */)
}

But that doesn't seem to work either, still seems to be trying to serialize
the Applicative... :(

On Mon, Mar 23, 2015 at 12:27 PM, Dean Wampler <de...@gmail.com>
wrote:

> Well, it's complaining about trait OptionInstances which is defined in
> Option.scala in the std package. Use scalap or javap on the scalaz library
> to find out which member of the trait is the problem, but since it says
> "$$anon$1", I suspect it's the first value member, "implicit val
> optionInstance", which has a long list of mixin traits, one of which is
> probably at fault. OptionInstances is huge, so there might be other
> offenders.
>
> Scalaz wasn't designed for distributed systems like this, so you'll
> probably find many examples of nonserializability. An alternative is to
> avoid using Scalaz in any closures passed to Spark methods, but that's
> probably not what you want.
>
> dean
>
> Dean Wampler, Ph.D.
> Author: Programming Scala, 2nd Edition
> <http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
> Typesafe <http://typesafe.com>
> @deanwampler <http://twitter.com/deanwampler>
> http://polyglotprogramming.com
>
> On Mon, Mar 23, 2015 at 12:03 PM, adelbertc <ad...@gmail.com> wrote:
>
>> Hey all,
>>
>> I'd like to use the Scalaz library in some of my Spark jobs, but am
>> running
>> into issues where some stuff I use from Scalaz is not serializable. For
>> instance, in Scalaz there is a trait
>>
>> /** In Scalaz */
>> trait Applicative[F[_]] {
>>   def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
>>   def point[A](a: => A): F[A]
>> }
>>
>> But when I try to use it in say, in an `RDD#aggregate` call I get:
>>
>>
>> Caused by: java.io.NotSerializableException:
>> scalaz.std.OptionInstances$$anon$1
>> Serialization stack:
>>         - object not serializable (class:
>> scalaz.std.OptionInstances$$anon$1,
>> value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
>>         - field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1,
>> type:
>> interface scalaz.Applicative)
>>         - object (class dielectric.syntax.RDDOps$$anonfun$1, <function2>)
>>         - field (class:
>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>> name: apConcat$1, type: interface scala.Function2)
>>         - object (class
>> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
>> <function2>)
>>
>> Outside of submitting a PR to Scalaz to make things Serializable, what
>> can I
>> do to make things Serializable? I considered something like
>>
>> implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
>> SomeSerializableType[F] =
>>   new SomeSerializableType { ... } ??
>>
>> Not sure how to go about doing it - I looked at java.io.Externalizable but
>> given `scalaz.Applicative` has no value members I'm not sure how to
>> implement the interface.
>>
>> Any guidance would be much appreciated - thanks!
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>


-- 
Adelbert (Allen) Chang

Re: Getting around Serializability issues for types not in my control

Posted by Dean Wampler <de...@gmail.com>.
Well, it's complaining about trait OptionInstances which is defined in
Option.scala in the std package. Use scalap or javap on the scalaz library
to find out which member of the trait is the problem, but since it says
"$$anon$1", I suspect it's the first value member, "implicit val
optionInstance", which has a long list of mixin traits, one of which is
probably at fault. OptionInstances is huge, so there might be other
offenders.

Scalaz wasn't designed for distributed systems like this, so you'll
probably find many examples of nonserializability. An alternative is to
avoid using Scalaz in any closures passed to Spark methods, but that's
probably not what you want.

dean

Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Mon, Mar 23, 2015 at 12:03 PM, adelbertc <ad...@gmail.com> wrote:

> Hey all,
>
> I'd like to use the Scalaz library in some of my Spark jobs, but am running
> into issues where some stuff I use from Scalaz is not serializable. For
> instance, in Scalaz there is a trait
>
> /** In Scalaz */
> trait Applicative[F[_]] {
>   def apply2[A, B, C](fa: F[A], fb: F[B])(f: (A, B) => C): F[C]
>   def point[A](a: => A): F[A]
> }
>
> But when I try to use it in say, in an `RDD#aggregate` call I get:
>
>
> Caused by: java.io.NotSerializableException:
> scalaz.std.OptionInstances$$anon$1
> Serialization stack:
>         - object not serializable (class:
> scalaz.std.OptionInstances$$anon$1,
> value: scalaz.std.OptionInstances$$anon$1@4516ee8c)
>         - field (class: dielectric.syntax.RDDOps$$anonfun$1, name: G$1,
> type:
> interface scalaz.Applicative)
>         - object (class dielectric.syntax.RDDOps$$anonfun$1, <function2>)
>         - field (class:
> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
> name: apConcat$1, type: interface scala.Function2)
>         - object (class
> dielectric.syntax.RDDOps$$anonfun$traverse$extension$1,
> <function2>)
>
> Outside of submitting a PR to Scalaz to make things Serializable, what can
> I
> do to make things Serializable? I considered something like
>
> implicit def applicativeSerializable[F[_]](implicit F: Applicative[F]):
> SomeSerializableType[F] =
>   new SomeSerializableType { ... } ??
>
> Not sure how to go about doing it - I looked at java.io.Externalizable but
> given `scalaz.Applicative` has no value members I'm not sure how to
> implement the interface.
>
> Any guidance would be much appreciated - thanks!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Getting-around-Serializability-issues-for-types-not-in-my-control-tp22193.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>