You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2016/01/11 11:15:39 UTC

[jira] [Resolved] (SPARK-5883) Add compression scheme in VertexAttributeBlock for shipping vertices to edge partitions

     [ https://issues.apache.org/jira/browse/SPARK-5883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-5883.
------------------------------
    Resolution: Won't Fix

> Add compression scheme in VertexAttributeBlock for shipping vertices to edge partitions
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-5883
>                 URL: https://issues.apache.org/jira/browse/SPARK-5883
>             Project: Spark
>          Issue Type: Improvement
>          Components: GraphX
>            Reporter: Takeshi Yamamuro
>
> The size of shipped data between vertex partitions and edge partitions
> is one of major issues for better performance.
> SPAR-3649 indicated the ~10% performance gain in Pregel iterations
> by using the custom serializers for ShuffledRDD.
> However, it is kind of tough to implement efficient serializers for ShuffledRDD
> inside GraphX because 1)how to use serializers in ShuffledRDD is different
> between SortShuffleManager and HashShuffleManager (See SPARK-3649)
> and 2)the type of 'VD' is unknown to GraphX.
> Therefore, I think that compressing shippded data inside GraphX
> (before they are passed into ShuffleRDD) is one of better solutions for that.
> GraphX users register user-defined serializer for VD, and then
> GraphX uses the serializer so as to compress shipped data between
> vertex partitions and edge ones.
> My current patch applies this idea in ReplicatedVertexView#upgrade
> and ReplicatedVertexView#updateVertices.
> https://github.com/maropu/spark/commit/665b6c4a273b90e7c6e1545f982c7576a0e5ceb2
> Also, it can be applied into ReplicatedVertexView#withActiveSet
> and VertexRDDImpl#aggregateUsingIndex.
> I'm not sure that this design is acceptable, so any advice welcomed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org