You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by va...@apache.org on 2015/05/05 19:40:04 UTC

[4/4] flink git commit: [FLINK-1871] [gelly] [docs] added a link to the migration guide in the Spargel guide; moved the migration guide section to the end of Gelly guide; made a few corrections in the migration guide text

[FLINK-1871] [gelly] [docs] added a link to the migration guide in the Spargel guide;
moved the migration guide section to the end of Gelly guide;
made a few corrections in the migration guide text

This closes #600


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/60ec6830
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/60ec6830
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/60ec6830

Branch: refs/heads/master
Commit: 60ec683082124359162ac3c97b223b5cfe44cbbd
Parents: 0163e50
Author: vasia <va...@apache.org>
Authored: Tue May 5 17:04:31 2015 +0200
Committer: vasia <va...@apache.org>
Committed: Tue May 5 17:15:16 2015 +0200

----------------------------------------------------------------------
 docs/libs/gelly_guide.md                        | 154 ++++++++++---------
 docs/libs/spargel_guide.md                      |   4 +-
 .../example/ConnectedComponentsExample.java     |   1 +
 .../apache/flink/graph/example/GSAPageRank.java |   1 +
 4 files changed, 86 insertions(+), 74 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/60ec6830/docs/libs/gelly_guide.md
----------------------------------------------------------------------
diff --git a/docs/libs/gelly_guide.md b/docs/libs/gelly_guide.md
index a52b40c..8292968 100644
--- a/docs/libs/gelly_guide.md
+++ b/docs/libs/gelly_guide.md
@@ -440,50 +440,109 @@ public static final class Messenger extends MessagingFunction {...}
 
 [Back to top](#top)
 
+
+Graph Validation
+-----------
+
+Gelly provides a simple utility for performing validation checks on input graphs. Depending on the application context, a graph may or may not be valid according to certain criteria. For example, a user might need to validate whether their graph contains duplicate edges or whether its structure is bipartite. In order to validate a graph, one can define a custom `GraphValidator` and implement its `validate()` method. `InvalidVertexIdsValidator` is Gelly's pre-defined validator. It checks that the edge set contains valid vertex IDs, i.e. that all edge IDs
+also exist in the vertex IDs set.
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+// create a list of vertices with IDs = {1, 2, 3, 4, 5}
+List<Vertex<Long, Long>> vertices = ...
+
+// create a list of edges with IDs = {(1, 2) (1, 3), (2, 4), (5, 6)}
+List<Edge<Long, Long>> edges = ...
+
+Graph<Long, Long, Long> graph = Graph.fromCollection(vertices, edges, env);
+
+// will return false: 6 is an invalid ID
+graph.validate(new InvalidVertexIdsValidator<Long, Long, Long>()); 
+
+{% endhighlight %}
+
+[Back to top](#top)
+
+Library Methods
+-----------
+Gelly has a growing collection of graph algorithms for easily analyzing large-scale Graphs. So far, the following library methods are implemented:
+
+* PageRank
+* Single-Source Shortest Paths
+* Label Propagation
+* Simple Community Detection
+* Connected Components
+
+Gelly's library methods can be used by simply calling the `run()` method on the input graph:
+
+{% highlight java %}
+ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
+
+Graph<Long, Long, NullValue> graph = ...
+
+// run Label Propagation for 30 iterations to detect communities on the input graph
+DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run(
+				new LabelPropagation<Long>(30)).getVertices();
+
+// print the result
+verticesWithCommunity.print();
+
+env.execute();
+{% endhighlight %}
+
+[Back to top](#top)
+
 Migrating Spargel Code to Gelly
 -----------
 
-Due to the natural mapping of Spargel components to Gelly components, applications can easily be migrated from one API to the other.
-General guidelines:
-* <strong>Vertex and Edge Abstractions</strong>: In Spargel, vertices and edges are defined using tuples (Tuple2 for vertices, Tuple2 for edges without values, Tuple3 for edges with values). Gelly presents a more intuitive vertex and edge representation by introducing the `Vertex` and `Edge` types. A `Vertex` is defined by an id and a value. If no value is provided, the value type should be set to `NullValue`. An `Edge` is defined by a source id,  a target id and a value. Since the source and target values are of the same type, only two type parameters are needed. If no value is provided, the value type should be set to `NullValue`.
+Gelly provides the old Spargel API functionality through its vertex-centric iteration methods. Applications can be easily migrated from one API to the other, using the following
+general guidelines:
 
-* <strong>Methods for Plain Edges and for Valued Edges</strong>: In Spargel, there are separate methods for edges with values and edges without values when running the vertex centric iteration (i.e. `withValuedEdges()`, `withPlainEdges()`). In Gelly, this distinction no longer needs to be made because an edge with no value will simply have a `NullValue` type.
+* <strong>Vertex and Edge Types</strong>: In Spargel, vertices and edges are defined using tuples (`Tuple2` for vertices, `Tuple2` for edges without values, `Tuple3` for edges with values). Gelly has a more intuitive [graph representation](#graph-representation) by introducing the `Vertex` and `Edge` types.
 
-* <strong>OutgoingEdge versus Edge</strong>: Spargel's `OutgoingEdge` is replaced by `Edge` in Gelly.
+* <strong>Methods for Plain Edges and for Valued Edges</strong>: In Spargel, there are separate methods for edges with values and edges without values when running the vertex centric iteration (i.e. `withValuedEdges()`, `withPlainEdges()`). In Gelly, this distinction is no longer needede because an edge with no value will simply have a `NullValue` type.
 
-* <strong>Running a Vertex Centric Iteration</strong>: In Spargel, an iteration is ran by calling the `runOperation()` method on a `VertexCentricIteration`. The edge type (plain or valued) dictates the method to be called. The arguments are: a data set of edges, the vertex update function, the messaging function and the maximum number of iterations. The result is a DataSet<Tuple2<vertexId, vertexValue>> representing the updated vertices.
-In Gelly, an iteration is ran by calling `runVertexCentricIteration()` on a graph. The parameters given to this method are the vertex update function, the messaging function and the maximum number of iterations. The result is a new graph with updated vertex values.
+* <strong>OutgoingEdge</strong>: Spargel's `OutgoingEdge` is replaced by `Edge` in Gelly.
 
-* <strong>Configuring a Vertex Centric Iteration</strong>: In Spargel, an iteration is configured by directly setting a set of parameters on the VertexCentricIteration instance (e.g. `iteration.setName("Spargel Iteration")`). In Gelly, a vertex-centric iteration is configured using the `IterationConfiguration` object (e.g. iterationConfiguration.setName("Gelly Iteration")). An instance of this object is then passed as a final parameter to the `runVertexCentricIteration()` method.
+* <strong>Running a Vertex Centric Iteration</strong>: In Spargel, an iteration is run by calling the `runOperation()` method on a `VertexCentricIteration`. The edge type (plain or valued) dictates the method to be called. The arguments are: a data set of edges, the vertex update function, the messaging function and the maximum number of iterations. The result is a `DataSet<Tuple2<vertexId, vertexValue>>` representing the updated vertices.
+In Gelly, an iteration is run by calling `runVertexCentricIteration()` on a `Graph`. The parameters given to this method are the vertex update function, the messaging function and the maximum number of iterations. The result is a new `Graph` with updated vertex values.
+
+* <strong>Configuring a Vertex Centric Iteration</strong>: In Spargel, an iteration is configured by directly setting a set of parameters on the `VertexCentricIteration` instance (e.g. `iteration.setName("Spargel Iteration")`). In Gelly, a vertex-centric iteration is configured using the `IterationConfiguration` object (e.g. `iterationConfiguration.setName("Gelly Iteration”)`). An instance of this object is then passed as a final parameter to the `runVertexCentricIteration()` method.
 
 * <strong>Record API</strong>: Spargel's Record API was completely removed from Gelly.
 
-In the following section, we present a step-by-step tutorial for moving the connected components algorithm from Spargel to Gelly.
+In the following section, we present a step-by-step example for porting the Connected Components algorithm from Spargel to Gelly.
 
-In Spargel, the edges and vertices are defined by a `DataSet<Tuple2<IdType, EdgeValue>>` and a `DataSet<Tuple2<IdType, VertexValue>>` respectively.
+In Spargel, the edges and vertices are defined by a `DataSet<Tuple2<IdType, EdgeValue>>` and a `DataSet<Tuple2<IdType, VertexValue>>` respectively:
 
 {% highlight java %}
+// Spargel API
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 DataSet<Tuple2<Long, Long>> edges = ...
 DataSet<Tuple2<Long, Long>> initialVertices = vertexIds.map(new IdAssigner());
 
-DataSet<Tuple2<Long, Long>> result = initialVertices.runOperation(VertexCentricIteration.withPlainEdges(edges, new CCUpdater(), new CCMessenger(), maxIterations));
+DataSet<Tuple2<Long, Long>> result = initialVertices.runOperation(
+				VertexCentricIteration.withPlainEdges(edges, new CCUpdater(), new CCMessenger(),
+				maxIterations));
 
 result.print();
 env.execute("Spargel Connected Components");
 {% endhighlight %}
 
 In this algorithm, initially, each vertex has its own ID as a value (is in its own component).
-Hence, the need for `IdAssigner()` user defined function which is simply a map that takes a value and creates a tuple (value,value) out of it.
+Hence, the need for `IdAssigner()`, which is used to initialize the vertex values.
 
 <p class="text-center">
-    <img alt="Spargel Example Input" width="75%" src="img/spargel_example_input.png" />
+    <img alt="Spargel Example Input" width="75%" src="fig/spargel_example_input.png" />
 </p>
 
 In Gelly, the edges and vertices have a more intuitive definition: they are represented by separate types `Edge`, `Vertex`.
-After defining the edge data set, we can create a Graph from it.
+After defining the edge data set, we can create a `Graph` from it.
 
 {% highlight java %}
+// Gelly API
 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
 DataSet<Edge<Long, NullValue>> edges = ...
 
@@ -494,18 +553,20 @@ Graph<Long, Long, NullValue> graph = Graph.fromDataSet(edges, new MapFunction<Lo
 			}
 		}, env);
 
-DataSet<Vertex<Long, Long>> result = graph.runVertexCentricIteration(new CCUpdater(),
-				new CCMessenger(), maxIterations).getVertices();
+DataSet<Vertex<Long, Long>> result = graph.runVertexCentricIteration(new CCUpdater(), new CCMessenger(), maxIterations)
+					.getVertices();
 
 result.print();
 env.execute("Gelly Connected Components");
 {% endhighlight %}
 
 Notice that when assigning the initial vertex IDs, there is no need to perform a separate map operation. The value is specified directly in the `fromDataSet()` method.
-Instead of calling `runOperation()` on the set of vertices, `runVertexCentricIteration()` is called on the graph instance.
-As previously stated, `runVertexCentricIteration` returns a new graph with the updated vertex values. In order to retrieve the result (since for this algorithm we are only interested in the vertex ids and their corresponding values), we will call the `getVertices()` method.
+Instead of calling `runOperation()` on the set of vertices, `runVertexCentricIteration()` is called on the `Graph` instance.
+As previously stated, `runVertexCentricIteration` returns a new `Graph` with the updated vertex values. In order to retrieve the result (since for this algorithm we are only interested in the vertex ids and their corresponding values), we will call the `getVertices()` method.
+
+The user-defined `VertexUpdateFunction` and `MessagingFunction` remain unchanged in Gelly, so you can reuse them without any changes.
 
-In the connected components algorithm, the vertices propagate their current component ID in iterations, each time adopting a new value from the received neighbor IDs, provided that the value is smaller than the current minimum.
+In the Connected Components algorithm, the vertices propagate their current component ID in iterations, each time adopting a new value from the received neighbor IDs, provided that the value is smaller than the current minimum.
 To this end, we iterate over all received messages and update the vertex value, if necessary:
 
 {% highlight java %}
@@ -533,69 +594,16 @@ public static final class CCMessenger extends MessagingFunction<Long, Long, Long
 }
 {% endhighlight %}
 
-The fact that the two names for the `VertexUpdateFunction` and for the `MessagingFunction`: `CCUpdater` and `CCMessenger` coincide in Spargel's `runOperation()` and in Gelly's `runVertexCentricIteration()` implies that the classes defined for the Spargel algortihm remain unchanged in Gelly.
 
 Similarly to Spargel, if the value of a vertex does not change during a superstep, it will **not send** any messages in the superstep. This allows to do incremental updates to the **hot (changing) parts** of the graph, while leaving **cold (steady) parts** untouched.
 
 The computation **terminates** after a specified *maximum number of supersteps* **-OR-** when the *vertex states stop changing*.
 
 <p class="text-center">
-    <img alt="Spargel Example" width="75%" src="img/spargel_example.png" />
+    <img alt="Spargel Example" width="75%" src="fig/spargel_example.png" />
 </p>
 
 [Back to top](#top)
 
-Graph Validation
------------
-
-Gelly provides a simple utility for performing validation checks on input graphs. Depending on the application context, a graph may or may not be valid according to certain criteria. For example, a user might need to validate whether their graph contains duplicate edges or whether its structure is bipartite. In order to validate a graph, one can define a custom `GraphValidator` and implement its `validate()` method. `InvalidVertexIdsValidator` is Gelly's pre-defined validator. It checks that the edge set contains valid vertex IDs, i.e. that all edge IDs
-also exist in the vertex IDs set.
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-// create a list of vertices with IDs = {1, 2, 3, 4, 5}
-List<Vertex<Long, Long>> vertices = ...
-
-// create a list of edges with IDs = {(1, 2) (1, 3), (2, 4), (5, 6)}
-List<Edge<Long, Long>> edges = ...
-
-Graph<Long, Long, Long> graph = Graph.fromCollection(vertices, edges, env);
-
-// will return false: 6 is an invalid ID
-graph.validate(new InvalidVertexIdsValidator<Long, Long, Long>()); 
-
-{% endhighlight %}
-
-[Back to top](#top)
-
-Library Methods
------------
-Gelly has a growing collection of graph algorithms for easily analyzing large-scale Graphs. So far, the following library methods are implemented:
-
-* PageRank
-* Single-Source Shortest Paths
-* Label Propagation
-* Simple Community Detection
-* Connected Components
-
-Gelly's library methods can be used by simply calling the `run()` method on the input graph:
-
-{% highlight java %}
-ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
-
-Graph<Long, Long, NullValue> graph = ...
-
-// run Label Propagation for 30 iterations to detect communities on the input graph
-DataSet<Vertex<Long, Long>> verticesWithCommunity = graph.run(
-				new LabelPropagation<Long>(30)).getVertices();
-
-// print the result
-verticesWithCommunity.print();
-
-env.execute();
-{% endhighlight %}
-
-[Back to top](#top)
 
 

http://git-wip-us.apache.org/repos/asf/flink/blob/60ec6830/docs/libs/spargel_guide.md
----------------------------------------------------------------------
diff --git a/docs/libs/spargel_guide.md b/docs/libs/spargel_guide.md
index 108b8f7..ab69783 100644
--- a/docs/libs/spargel_guide.md
+++ b/docs/libs/spargel_guide.md
@@ -32,8 +32,10 @@ This vertex-centric view makes it easy to express a large class of graph problem
 * This will be replaced by the TOC
 {:toc}
 
-Spargel API - DEPRECATED (Please check out new [Gelly API](gelly_guide.html) for graph processing with Apache Flink)
+Spargel API - DEPRECATED
 -----------
+The Spargel API is Deprecated. Please check out the new [Gelly API](gelly_guide.html) for graph processing with Apache Flink. If you want to port your Spargel code into Gelly,
+please check the [migration guide](gelly_guide.html#migrating-spargel-code-to-gelly).
 
 The Spargel API is part of the *addons* Maven project. All relevant classes are located in the *org.apache.flink.spargel.java* package.
 

http://git-wip-us.apache.org/repos/asf/flink/blob/60ec6830/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ConnectedComponentsExample.java
----------------------------------------------------------------------
diff --git a/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ConnectedComponentsExample.java b/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ConnectedComponentsExample.java
index 63265b6..a185a70 100644
--- a/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ConnectedComponentsExample.java
+++ b/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/ConnectedComponentsExample.java
@@ -119,6 +119,7 @@ public class ConnectedComponentsExample implements ProgramDescription {
 		return true;
 	}
 
+	@SuppressWarnings("serial")
 	private static DataSet<Edge<Long, NullValue>> getEdgesDataSet(ExecutionEnvironment env) {
 
 		if(fileOutput) {

http://git-wip-us.apache.org/repos/asf/flink/blob/60ec6830/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAPageRank.java
----------------------------------------------------------------------
diff --git a/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAPageRank.java b/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAPageRank.java
index b6f0c87..b27a8fb 100644
--- a/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAPageRank.java
+++ b/flink-staging/flink-gelly/src/main/java/org/apache/flink/graph/example/GSAPageRank.java
@@ -45,6 +45,7 @@ import org.apache.flink.util.Collector;
  */
 public class GSAPageRank implements ProgramDescription {
 
+	@SuppressWarnings("serial")
 	public static void main(String[] args) throws Exception {
 
 		if(!parseParameters(args)) {