You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Andrew Ash (JIRA)" <ji...@apache.org> on 2014/11/12 07:21:33 UTC

[jira] [Commented] (SPARK-720) Statically guarantee serialization will succeed

    [ https://issues.apache.org/jira/browse/SPARK-720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14207732#comment-14207732 ] 

Andrew Ash commented on SPARK-720:
----------------------------------

A big design goal of Spark is that you don't have to have any type restrictions on the objects contained in an RDD.  If the parametrized type of the RDD happens to implement Comparable then the shuffler can make optimizations, but it's not a requirement.  It's also not a requirement that the type implements Serializable, because you could be using Kryo to serialize and transport an object out of the Spark user's control that doesn't have the Serializable marker interface.  I think it would be a hard sell to add restrictions to the type parametrized type of an RDD.

Getting over that, I think the intention of guaranteeing serialization via the type system could work for standard JVM serialization (Serializable and Externalizable interfaces) because a class's serializability is clearly marked with those interfaces already.  But I'm concerned that it couldn't be made to work with other serializer systems such as Kryo where there is no convenient marker interface.

Kryo does serialization by registering a serializer for each class, and using that Class->Serializer map for future serialization by reflectively looking at an object's type as it receives objects for serialization.  The only way to know if a class is serializable is to know what classes a Kryo instance has registered at compile time, which I believe is impossible given that the Registrator comes from outside the Spark codebase.

[~emchristiansen] do you see a way to implement this generically for JVM serialization + Kryo + other systems in the future?  I think we may have to close this request for infeasibility.

> Statically guarantee serialization will succeed
> -----------------------------------------------
>
>                 Key: SPARK-720
>                 URL: https://issues.apache.org/jira/browse/SPARK-720
>             Project: Spark
>          Issue Type: Improvement
>    Affects Versions: 0.7.1
>            Reporter: Eric Christiansen
>
> First, thanks for developing Spark. It's great.
> Maybe I'm trying to serialize weird objects (eg Shapeless constructs), but I tend to get quite a few NotSerializableExceptions. These are pretty annoying because they happen at runtime, lengthening my code/debug cycle. 
> I'd like it if Spark could introduce a serialization system that could statically check that serialization will succeed. One approach is to use typeclasses, perhaps using Spray-Json as inspiration. An added benefit of typeclasses is they can be used to serialize objects that were not originally intended to be serialized.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org