You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2015/05/02 07:52:05 UTC

[jira] [Commented] (SPARK-6986) Makes SparkSqlSerializer2 support sort-based shuffle with sort merge

    [ https://issues.apache.org/jira/browse/SPARK-6986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14525070#comment-14525070 ] 

Apache Spark commented on SPARK-6986:
-------------------------------------

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/5849

> Makes SparkSqlSerializer2 support sort-based shuffle with sort merge
> --------------------------------------------------------------------
>
>                 Key: SPARK-6986
>                 URL: https://issues.apache.org/jira/browse/SPARK-6986
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Yin Huai
>            Assignee: Yin Huai
>
> *Update*: SPARK-4550 has exposed the interfaces. We can safely enable Serializer2 to support sort merge.
> *Original description*:
> Our existing Java and Kryo serializer are both general-purpose serialize. They treat every object individually and encode the type of an object to underlying stream. For Spark, it is common that we serialize a collection with records having the same types (for example, records of a DataFrame). For these cases, we do not need to write out types of records and we can take advantage the type information to build specialized serializer. To do so, seems we need to extend the interface of SerializationStream/DeserializationStream, so a SerializationStream/DeserializationStream can have more information about objects passed in (for example, if an object is key/value pair, a key, or a value).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org