You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Michael Malak <mi...@yahoo.com.INVALID> on 2015/01/19 21:20:14 UTC

GraphX vertex partition/location strategy

Does GraphX make an effort to co-locate vertices onto the same workers as the majority (or even some) of its edges?

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: GraphX vertex partition/location strategy

Posted by Michael Malak <mi...@yahoo.com.INVALID>.
But wouldn't the gain be greater under something similar to EdgePartition1D (but perhaps better load-balanced based on number of edges for each vertex) and an algorithm that primarily follows edges in the forward direction?
      From: Ankur Dave <an...@gmail.com>
 To: Michael Malak <mi...@yahoo.com> 
Cc: "dev@spark.apache.org" <de...@spark.apache.org> 
 Sent: Monday, January 19, 2015 2:08 PM
 Subject: Re: GraphX vertex partition/location strategy
   
No - the vertices are hash-partitioned onto workers independently of the edges. It would be nice for each vertex to be on the worker with the most adjacent edges, but we haven't done this yet since it would add a lot of complexity to avoid load imbalance while reducing the overall communication by a small factor.
We refer to the number of partitions containing adjacent edges for a particular vertex as the vertex's replication factor. I think the typical replication factor for power-law graphs with 100-200 partitions is 10-15, and placing the vertex at the ideal location would only reduce the replication factor by 1.

Ankur


On Mon, Jan 19, 2015 at 12:20 PM, Michael Malak <mi...@yahoo.com.invalid> wrote:

Does GraphX make an effort to co-locate vertices onto the same workers as the majority (or even some) of its edges?



   

Re: GraphX vertex partition/location strategy

Posted by Ankur Dave <an...@gmail.com>.
No - the vertices are hash-partitioned onto workers independently of the
edges. It would be nice for each vertex to be on the worker with the most
adjacent edges, but we haven't done this yet since it would add a lot of
complexity to avoid load imbalance while reducing the overall communication
by a small factor.

We refer to the number of partitions containing adjacent edges for a
particular vertex as the vertex's replication factor. I think the typical
replication factor for power-law graphs with 100-200 partitions is 10-15,
and placing the vertex at the ideal location would only reduce the
replication factor by 1.

Ankur <http://www.ankurdave.com/>

On Mon, Jan 19, 2015 at 12:20 PM, Michael Malak <
michaelmalak@yahoo.com.invalid> wrote:

> Does GraphX make an effort to co-locate vertices onto the same workers as
> the majority (or even some) of its edges?
>