You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by arpp <ar...@gmail.com> on 2015/03/28 13:54:12 UTC

Custom edge partitioning in graphX

Hi all,
I am working with spark 1.0.0. mainly for the usage of GraphX and wished to
apply some custom partitioning strategies on the edge list of the graph.
I have generated an edge list file which has the partition number after the
source and destination id in each line. Initially I am loading the
unannotated graph using GraphLoader and then loading the annotated file and
applying 


val unpartitionedGraph = GraphLoader.edgeListFile(sc, fname,
minEdgePartitions = numEPart).cache()
val graph = Graph(unpartitionedGraph.vertices,
partitionCustom(unpartitionedGraph.edges))
// The above method is workaround for spark 1.0.0

 def partitionCustom[ED](edges: RDD[Edge[ED]]): RDD[Edge[ED]] = {
      edges.map(e => (customPartition(e.srcId, e.dstId), e))
        .partitionBy(new HashPartitioner(numPartitions))
        .mapPartitions(_.map(_._2), preservesPartitioning = true)
    }


def customPartition(src: VertexId, dst: VertexId): PartitionID = {
// search for the src and dest line in the loaded annotated file
// read the third element of that line and return it
}

But this method is inefficient as it requires to load the same data multiple
times and also slow as I am performing a large number of searches on really
huge edge list files.
Please suggest some efficient ways of doing this. Also please note that I am
stuck with spark 1.0.0 as I am only a user of the cluster available.

Regards,
Arpit Kumar



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Custom-edge-partitioning-in-graphX-tp22269.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org