You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2019/07/16 16:42:16 UTC

[jira] [Updated] (SPARK-27799) Allow SerializerManager.canUseKryo whitelist to be extended via a configuration

     [ https://issues.apache.org/jira/browse/SPARK-27799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun updated SPARK-27799:
----------------------------------
    Affects Version/s:     (was: 2.4.0)
                       3.0.0

> Allow SerializerManager.canUseKryo whitelist to be extended via a configuration
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-27799
>                 URL: https://issues.apache.org/jira/browse/SPARK-27799
>             Project: Spark
>          Issue Type: New Feature
>          Components: Spark Core
>    Affects Versions: 3.0.0
>            Reporter: Josh Rosen
>            Priority: Major
>
> Kryo serialization can offer a substantial performance boost compared to Java serialization and I generally recommend that users configure Spark to use it.
> That said, in general it may not be safe to _blindly_ flip the default to Kryo: certain jobs might depend on Java serialization, so switching them to Kryo might cause crashes or incorrect behavior.
> However, we may know that certain data types are safe to serialize with Kryo, in which case we can whitelist _just those types_ for use with Kryo serialization but keep everything else using the default Java serializer.
> Back in SPARK-13926 (Spark 2.0) I added a {{SerializerManager}} to implement this idea for strings, primitives, primitive arrays, and a few other data types: those types will automatically use Kryo serialization when used as top-level types in RDDs. However, there's no ability for users to customize / extend this whitelist.
> I propose to add a new user-facing configuration, name TBD, which accepts a comma-separated list of class / interface names and uses them to expand the {{SerializerMananger.canUseKryo}} whitelist.
> This will allow advanced users to incrementally default to Kryo for certain types (e.g. Scrooge ThriftStructs).
> This feature is useful for "data platform" teams who provide Spark-as-a-service to internal customers: with this proposed configuration, platform teams can configure global defaults for serialization in a way which is more incremental / narrow-in-scope than simply defaulting to Kryo everywhere.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org