You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Olivier Girardot <o....@lateral-thoughts.com> on 2017/03/13 18:15:41 UTC

Graphframes PageRank ends up on 1 partition

Hi everyone,
I'm trying to use the graphframes api on a regularly sized dataset (the
bike share of the bay area), and something strange happens.
My edges and vertices are both re-partitioned using ~100 partitions but
when I call the pagerank algorithm from pyspark here's what happens :

[image: Images intégrées 1]

So the "createRoutingTables" and the "aggregateMessages" transformations
both ends up on 1 partition and I don't seem to be able to do anything
about it (especially from the PySpark API).
I'm using pyspark with Spark 2.1 and
graphframes:graphframes:0.3.0-spark2.0-s_2.11 as a package.

Thanks for your help,

-- 
*Olivier Girardot*