You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Martin Kiefer (JIRA)" <ji...@apache.org> on 2015/02/22 17:12:11 UTC
[jira] [Created] (FLINK-1597) VertexCentricIterations create
inefficient execution plans
Martin Kiefer created FLINK-1597:
------------------------------------
Summary: VertexCentricIterations create inefficient execution plans
Key: FLINK-1597
URL: https://issues.apache.org/jira/browse/FLINK-1597
Project: Flink
Issue Type: Bug
Components: Gelly
Affects Versions: master
Reporter: Martin Kiefer
I did experiments with optimized versions of a graph algorithm that should utilize a secondary sort on the edges and a trade off between superstep numbers and I/O. To my surprise the optimizations did barely affect the execution times. I narrowed it down to inefficient execution plans.
I assumed that edge sets would be partitioned once at the beginning of a VertexCentricIteration and never be touched again because they can not change during the iteration. I think this should be the desired behavior. What actually happens is that UDFs creating the edge set are pulled inside the iteration and are executed every superstep. This harms the performance of graph algorithms significantly.
As a simple example have a look at the execution plan generated for the PageRankExample:
https://gist.github.com/martinkiefer/28a63f953477e3987b5d
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)