You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Martin Kiefer (JIRA)" <ji...@apache.org> on 2015/02/18 12:48:11 UTC

[jira] [Comment Edited] (FLINK-1552) Allow secondary sorts in Vertex Centric Iteration

    [ https://issues.apache.org/jira/browse/FLINK-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14325769#comment-14325769 ] 

Martin Kiefer edited comment on FLINK-1552 at 2/18/15 11:47 AM:
----------------------------------------------------------------

I worked on Approximate Maximum Weight Watchings. Optimal Min/Max Weight Matchings are usually found by variants of the Blossom algorithm, however, the algorithm parallelizes badly. You can obtain a 1/2-approximation with less complex algorithms.

Salihogulu and Widom proposed several graph algorithms in their paper "Optimizing Graph Algorithms on Pregel-like Systems" at VLDB 2014. Among these algorithms was a scalable variant of an Approximate Maximum Weight Matching Algorithm. http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf

I implemented it with Gelly in the plain version presented in the paper. Additionally, we implemented [Flink-1515], so we could provide an implementation with the optimization the authors called "Edge Cleaning on Demand (ECOD)". 
I also dirtily implemented a SortedVertexCentricIteration that does adress this issue and provided two additionaly variants making use of secondary sorts.

However, one could argue that this is not the most beautiful algorithm for an implementation with Gelly. At least not as an easy example. The algorithm requires you to find the maximum weight edge of vertices and somehow provide them in the next update step. You also need to do something that is equivalent to removing edges from vertices. So, you have to choose between two options of wich either one kind of lacks beauty:

1. Store all Edges in the VertexValue
We then can find the maximum vertex value in the update step and store it in the vertex value. We can easily remove edges from the vertex value.
This blows up the the VertexValue right from the beginning and makes the messaging CoGroup in a VertexCentricIteration senseless.

2. Store the VertexKeys of all removed edges in the VertexValue + self messaging
We can find the maximum vertex value in the messaging step and the vertex can send a message to itself to remember its decision. This hopefully has low cost because the message should not have to go over the network. We only store the vertex keys for deleted edges in the vertex state so we can ignore them in the messaging step.

We chose the latter option. 

If you are nevertheless interested in this algorithm I can give you access to the code so you can have a look at it.


was (Author: martinkiefer):
I worked on Approximate Maximum Weight Watchings. Optimal Min/Max Weight Matchings are usually found by variants of the Blossom algorithm, however, the algorithm parallelizes badly. You can obtain a 1/2-approximation with less complex algorithms.

Salihogulu and Widom proposed several graph algorithms in their paper "Optimizing Graph Algorithms on Pregel-like Systems" at VLDB 2014. Among these algorithms was a scalable variant of an Approximate Maximum Weight Matching Algorithm. http://ilpubs.stanford.edu:8090/1077/3/p535-salihoglu.pdf

I implemented it with Gelly in the plain version presented in the paper. Additionally, we implemented [Flink-1515], so we could provide an implementation with the optimization the authors called "Edge Cleaning on Demand (ECOD)". 
I also dirtily implemented a SortedVertexCentricIteration that does adress this issue and provided two additionaly variants making use of secondary sorts.

However, one could argue that this is not the most beautiful algorithm for an implementation with Gelly. The algorithm requires you to find the maximum weight edge of vertices and somehow provide them in the next update step. You also need to do something that is equivalent to removing edges from vertices. So, you have to choose between two options of wich either one kind of lacks beauty:

1. Store all Edges in the VertexValue
We then can find the maximum vertex value in the update step and store it in the vertex value. We can easily remove edges from the vertex value.
This blows up the the VertexValue right from the beginning and makes the messaging CoGroup in a VertexCentricIteration senseless.

2. Store the VertexKeys of all removed edges in the VertexValue + self messaging
We can find the maximum vertex value in the messaging step and the vertex can send a message to itself to remember its decision. This hopefully has low cost because the message should not have to go over the network. We only store the vertex keys for deleted edges in the vertex state so we can ignore them in the messaging step.

We chose the latter option. 

If you are nevertheless interested in this algorithm I can give you access to the code so you can have a look at it.

> Allow secondary sorts in Vertex Centric Iteration
> -------------------------------------------------
>
>                 Key: FLINK-1552
>                 URL: https://issues.apache.org/jira/browse/FLINK-1552
>             Project: Flink
>          Issue Type: Wish
>          Components: Gelly
>            Reporter: Martin Kiefer
>            Priority: Minor
>
> The `VertexCentricIteration` class holds the logic to transform a `VertexUpdateFunction` and a `MessagingFunction` into an iteration with two CoGroup operators working on the set of messages and edges. Graph algorithms can profit from implying an order on the edges or messages based on their value and/or the vertex ID. This can be implemented easily making use of secondary sorts. I would suggest extending the `VertexCentricIteration` to allow to specify these kind of orderings optionally.
> For example, this comes handy when it is necessary to find the edges with the minimum or maximum value or the algorithm requires to pick the edge with lower vertex ID for edges with equal value. Similar use cases might be found for orders on the messages. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)