You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@flink.apache.org by gr...@apache.org on 2017/03/02 23:33:23 UTC

[2/2] flink git commit: [FLINK-5597] [docs] Improve the LocalClusteringCoefficient documentation

[FLINK-5597] [docs] Improve the LocalClusteringCoefficient documentation

Update the documentation for Gelly's library methods with improved
algorithm descriptions and explanation of algorithm results.

This closes #3404


Project: http://git-wip-us.apache.org/repos/asf/flink/repo
Commit: http://git-wip-us.apache.org/repos/asf/flink/commit/cb9e409b
Tree: http://git-wip-us.apache.org/repos/asf/flink/tree/cb9e409b
Diff: http://git-wip-us.apache.org/repos/asf/flink/diff/cb9e409b

Branch: refs/heads/master
Commit: cb9e409b764f95e07441a0c8da6c24e21bc1564b
Parents: ea14053
Author: Greg Hogan <co...@greghogan.com>
Authored: Thu Feb 23 16:26:45 2017 -0500
Committer: Greg Hogan <co...@greghogan.com>
Committed: Thu Mar 2 16:42:58 2017 -0500

----------------------------------------------------------------------
 docs/dev/libs/gelly/library_methods.md          | 92 ++++++++++----------
 .../directed/LocalClusteringCoefficient.java    |  4 +-
 .../clustering/directed/TriadicCensus.java      |  2 +-
 .../clustering/directed/TriangleListing.java    |  8 +-
 .../undirected/LocalClusteringCoefficient.java  |  4 +-
 .../clustering/undirected/TriadicCensus.java    |  4 +-
 .../clustering/undirected/TriangleListing.java  | 15 ++--
 .../flink/graph/library/link_analysis/HITS.java |  4 +-
 .../graph/library/similarity/AdamicAdar.java    |  6 +-
 .../graph/library/similarity/JaccardIndex.java  | 10 +--
 .../graph/utils/proxy/OptionalBoolean.java      |  2 +-
 11 files changed, 79 insertions(+), 72 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/docs/dev/libs/gelly/library_methods.md
----------------------------------------------------------------------
diff --git a/docs/dev/libs/gelly/library_methods.md b/docs/dev/libs/gelly/library_methods.md
index 94eee2e..026bea4 100644
--- a/docs/dev/libs/gelly/library_methods.md
+++ b/docs/dev/libs/gelly/library_methods.md
@@ -166,18 +166,6 @@ The algorithm is implemented using [gather-sum-apply iterations](#gather-sum-app
 
 See the [Single Source Shortest Paths](#single-source-shortest-paths) library method for implementation details and usage information.
 
-## Triangle Count
-
-#### Overview
-An analytic for counting the number of unique triangles in a graph.
-
-#### Details
-Counts the triangles generated by [Triangle Listing](#triangle-listing).
-
-#### Usage
-The analytic takes an undirected graph as input and returns as a result a `Long` corresponding to the number of triangles
-in the graph. The graph ID type must be `Comparable` and `Copyable`.
-
 ## Triangle Enumerator
 
 #### Overview
@@ -229,11 +217,12 @@ neighbors) to 1.0 (complete graph).
 
 #### Details
 See the [Local Clustering Coefficient](#local-clustering-coefficient) library method for a detailed explanation of
-clustering coefficient.
+clustering coefficient. The Average Clustering Coefficient is the average of the Local Clustering Coefficient scores
+over all vertices with at least two neighbors. Each vertex, independent of degree, has equal weight for this score.
 
 #### Usage
-Directed and undirected variants are provided. The algorithm takes a simple graph as input and outputs a result
-containing the total number of vertices and average local clustering coefficient of the graph. The graph ID type must be
+Directed and undirected variants are provided. The analytics take a simple graph as input and output an `AnalyticResult`
+containing the total number of vertices and average clustering coefficient of the graph. The graph ID type must be
 `Comparable` and `Copyable`.
 
 * `setLittleParallelism`: override the parallelism of operators processing small amounts of data
@@ -246,12 +235,14 @@ neighbors) to 1.0 (complete graph).
 
 #### Details
 See the [Local Clustering Coefficient](#local-clustering-coefficient) library method for a detailed explanation of
-clustering coefficient.
+clustering coefficient. The Global Clustering Coefficient is the ratio of connected neighbors over the entire graph.
+Vertices with higher degrees have greater weight for this score because the count of neighbor pairs is quadratic in
+degree.
 
 #### Usage
-Directed and undirected variants are provided. The algorithm takes a simple graph as input and outputs a result
-containing the total number of triplets and triangles in the graph. The graph ID type must be `Comparable` and
-`Copyable`.
+Directed and undirected variants are provided. The analytics take a simple graph as input and output an `AnalyticResult`
+containing the total number of triplets and triangles in the graph. The result class provides a method to compute the
+global clustering coefficient score. The graph ID type must be `Comparable` and `Copyable`.
 
 * `setLittleParallelism`: override the parallelism of operators processing small amounts of data
 
@@ -262,19 +253,20 @@ The local clustering coefficient measures the connectedness of each vertex's nei
 edges between neighbors) to 1.0 (neighborhood is a clique).
 
 #### Details
-An edge between a vertex's neighbors is a triangle. Counting edges between neighbors is equivalent to counting the
+An edge between neighbors of a vertex is a triangle. Counting edges between neighbors is equivalent to counting the
 number of triangles which include the vertex. The clustering coefficient score is the number of edges between neighbors
 divided by the number of potential edges between neighbors.
 
-See the [Triangle Enumerator](#triangle-enumerator) library method for a detailed explanation of triangle enumeration.
+See the [Triangle Listing](#triangle-listing) library method for a detailed explanation of triangle enumeration.
 
 #### Usage
-Directed and undirected variants are provided. The algorithms take a simple graph as input and outputs a `DataSet` of
-tuples containing the vertex ID, vertex degree, and number of triangles containing the vertex. The graph ID type must be
-`Comparable` and `Copyable`.
+Directed and undirected variants are provided. The algorithms take a simple graph as input and output a `DataSet` of
+`UnaryResult` containing the vertex ID, vertex degree, and number of triangles containing the vertex. The result class
+provides a method to compute the local clustering coefficient score. The graph ID type must be `Comparable` and
+`Copyable`.
 
-* `setLittleParallelism`: override the parallelism of operators processing small amounts of data
 * `setIncludeZeroDegreeVertices`: include results for vertices with a degree of zero
+* `setLittleParallelism`: override the parallelism of operators processing small amounts of data
 
 ### Triadic Census
 
@@ -284,29 +276,35 @@ or unconnected. The [Triadic Census](http://vlado.fmf.uni-lj.si/pub/networks/doc
 occurrences of each type of triad with the graph.
 
 #### Details
-The analytics counts the four undirected triad types (formed with 0, 1, 2, or 3 connecting edges) or 16 directed triad
+This analytic counts the four undirected triad types (formed with 0, 1, 2, or 3 connecting edges) or 16 directed triad
 types by counting the triangles from [Triangle Listing](#triangle-listing) and running [Vertex Metrics](#vertex-metrics)
 to obtain the number of triplets and edges. Triangle counts are then deducted from triplet counts, and triangle and
 triplet counts are removed from edge counts.
 
 #### Usage
-Directed and undirected variants are provided. The analytics take a simple graph as input and outputs a `Result` with
-accessor methods for the computed statistics. The graph ID type must be `Comparable` and `Copyable`.
+Directed and undirected variants are provided. The analytics take a simple graph as input and output an
+`AnalyticResult` with accessor methods for querying the count of each triad type. The graph ID type must be
+`Comparable` and `Copyable`.
 
 * `setLittleParallelism`: override the parallelism of operators processing small amounts of data
 
 ### Triangle Listing
 
 #### Overview
-This algorithm supports object reuse. The graph ID type must be `Comparable` and `Copyable`.
+Enumerates all triangles in the graph. A triangle is composed of three edges connecting three vertices into cliques of
+size 3.
 
 #### Details
-See the [Triangle Enumerator](#triangle-enumerator) library method for implementation details.
+Triangles are listed by joining open triplets (two edges with a common neighbor) against edges on the triplet endpoints.
+This implementation uses optimizations from
+[Schank's algorithm](http://i11www.iti.uni-karlsruhe.de/extra/publications/sw-fclt-05_t.pdf) to improve performance with
+high-degree vertices. Triplets are generated from the lowest degree vertex since each triangle need only be listed once.
+This greatly reduces the number of generated triplets which is quadratic in vertex degree.
 
 #### Usage
-Directed and undirected variants are provided. The algorithms take a simple graph as input and outputs a `DataSet` of
-tuples containing the three triangle vertices and, for the directed algorithm, a bitmask marking each of the six
-potential triangle edges. The graph ID type must be `Comparable` and `Copyable`.
+Directed and undirected variants are provided. The algorithms take a simple graph as input and output a `DataSet` of
+`TertiaryResult` containing the three triangle vertices and, for the directed algorithm, a bitmask marking each of the
+six potential edges connecting the three vertices. The graph ID type must be `Comparable` and `Copyable`.
 
 * `setLittleParallelism`: override the parallelism of operators processing small amounts of data
 * `setSortTriangleVertices`: normalize the triangle listing such that for each result (K0, K1, K2) the vertex IDs are sorted K0 < K1 < K2
@@ -324,11 +322,13 @@ good authorities and good authorities are those pointed to by many good hubs.
 Every vertex is assigned the same initial hub and authority scores. The algorithm then iteratively updates the scores
 until termination. During each iteration new hub scores are computed from the authority scores, then new authority
 scores are computed from the new hub scores. The scores are then normalized and optionally tested for convergence.
+HITS is similar to [PageRank](#pagerank) but vertex scores are emitted in full to each neighbor whereas in PageRank
+the vertex score is first divided by the number of neighbors.
 
 #### Usage
-The algorithm takes a directed graph as input and outputs a `DataSet` of `Tuple3` containing the vertex ID, hub score,
-and authority score. Termination is configured with a maximum number of iterations and/or a convergence threshold
-on the sum of the change in each score for each vertex between iterations.
+The algorithm takes a simple directed graph as input and outputs a `DataSet` of `UnaryResult` containing the vertex ID,
+hub score, and authority score. Termination is configured by the number of iterations and/or a convergence threshold on
+the iteration sum of the change in scores over all vertices.
 
 * `setParallelism`: override the operator parallelism
 
@@ -377,8 +377,8 @@ The statistics are computed over vertex degrees generated from `degree.annotate.
 `degree.annotate.undirected.VertexDegree`.
 
 #### Usage
-Directed and undirected variants are provided. The analytics take a simple graph as input and output a `Result` with
-accessor methods for the computed statistics. The graph ID type must be `Comparable`.
+Directed and undirected variants are provided. The analytics take a simple graph as input and output an `AnalyticResult`
+with accessor methods for the computed statistics. The graph ID type must be `Comparable`.
 
 * `setIncludeZeroDegreeVertices`: include results for vertices with a degree of zero
 * `setParallelism`: override the operator parallelism
@@ -398,8 +398,8 @@ The statistics are computed over edge degrees generated from `degree.annotate.di
 `degree.annotate.undirected.EdgeDegreePair` and grouped by vertex.
 
 #### Usage
-Directed and undirected variants are provided. The analytics take a simple graph as input and output a `Result` with
-accessor methods for the computed statistics. The graph ID type must be `Comparable`.
+Directed and undirected variants are provided. The analytics take a simple graph as input and output an `AnalyticResult`
+with accessor methods for the computed statistics. The graph ID type must be `Comparable`.
 
 * `setParallelism`: override the operator parallelism
 * `setReduceOnTargetId` (undirected only): the degree can be counted from either the edge source or target IDs. By default the source IDs are counted. Reducing on target IDs may optimize the algorithm if the input edge list is sorted by target ID
@@ -416,13 +416,13 @@ influential to each pair of neighbors.
 #### Details
 The algorithm first annotates each vertex with the inverse of the logarithm of the vertex degree then joins this score
 onto edges by source vertex. Grouping on the source vertex, each pair of neighbors is emitted with the vertex score.
-Grouping on two-paths, the Adamic-Adar score is summed.
+Grouping on vertex pairs, the Adamic-Adar score is summed.
 
 See the [Jaccard Index](#jaccard-index) library method for a similar algorithm.
 
 #### Usage
-The algorithm takes a simple, undirected graph as input and outputs a `DataSet` of tuples containing two vertex IDs and
-the Adamic-Adair similarity score. The graph ID type must be `Comparable` and `Copyable`.
+The algorithm takes a simple undirected graph as input and outputs a `DataSet` of `BinaryResult` containing two vertex
+IDs and the Adamic-Adar similarity score. The graph ID type must be `Copyable`.
 
 * `setLittleParallelism`: override the parallelism of operators processing small amounts of data
 * `setMinimumRatio`: filter out Adamic-Adar scores less than the given ratio times the average score
@@ -441,12 +441,12 @@ distinct neighbors is computed by storing the sum of degrees of the vertex pair
 neighbors, which are double-counted in the sum of degrees.
 
 The algorithm first annotates each edge with the target vertex's degree. Grouping on the source vertex, each pair of
-neighbors is emitted with the degree sum. Grouping on two-paths, the shared neighbors are counted.
+neighbors is emitted with the degree sum. Grouping on vertex pairs, the shared neighbors are counted.
 
 #### Usage
-The algorithm takes a simple, undirected graph as input and outputs a `DataSet` of tuples containing two vertex IDs,
+The algorithm takes a simple undirected graph as input and outputs a `DataSet` of tuples containing two vertex IDs,
 the number of shared neighbors, and the number of distinct neighbors. The result class provides a method to compute the
-Jaccard Index score. The graph ID type must be `Comparable` and `Copyable`.
+Jaccard Index score. The graph ID type must be `Copyable`.
 
 * `setLittleParallelism`: override the parallelism of operators processing small amounts of data
 * `setMaximumScore`: filter out Jaccard Index scores greater than or equal to the given maximum fraction

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/LocalClusteringCoefficient.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/LocalClusteringCoefficient.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/LocalClusteringCoefficient.java
index cad36b3..ffd4b13 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/LocalClusteringCoefficient.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/LocalClusteringCoefficient.java
@@ -44,11 +44,11 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
  * The local clustering coefficient measures the connectedness of each vertex's
  * neighborhood. Scores range from 0.0 (no edges between neighbors) to 1.0
  * (neighborhood is a clique).
- * <br/>
+ * <p>
  * An edge between a vertex's neighbors is a triangle. Counting edges between
  * neighbors is equivalent to counting the number of triangles which include
  * the vertex.
- * <br/>
+ * <p>
  * The input graph must be a simple graph containing no duplicate edges or
  * self-loops.
  *

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriadicCensus.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriadicCensus.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriadicCensus.java
index 6c9e7b6..2274e3e 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriadicCensus.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriadicCensus.java
@@ -40,7 +40,7 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
 /**
  * A triad is formed by three connected or unconnected vertices in a graph.
  * The triadic census counts the occurrences of each type of triad.
- * <br/>
+ * <p>
  * http://vlado.fmf.uni-lj.si/pub/networks/doc/triads/triads.pdf
  *
  * @param <K> graph ID type

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriangleListing.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriangleListing.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriangleListing.java
index a1006c4..6b3e2a1 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriangleListing.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/directed/TriangleListing.java
@@ -51,11 +51,15 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
 
 /**
  * Generates a listing of distinct triangles from the input graph.
- * <br/>
+ * <p>
  * A triangle is a 3-clique with vertices A, B, and C connected by edges
  * (A, B), (A, C), and (B, C).
- * <br/>
+ * <p>
  * The input graph must not contain duplicate edges or self-loops.
+ * <p>
+ * This algorithm is similar to the undirected version but also tracks and
+ * computes a bitmask representing the six potential graph edges connecting
+ * the triangle vertices.
  *
  * @param <K> graph ID type
  * @param <VV> vertex value type

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/LocalClusteringCoefficient.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/LocalClusteringCoefficient.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/LocalClusteringCoefficient.java
index 99e3db9..31ddf45 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/LocalClusteringCoefficient.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/LocalClusteringCoefficient.java
@@ -44,11 +44,11 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
  * The local clustering coefficient measures the connectedness of each vertex's
  * neighborhood. Scores range from 0.0 (no edges between neighbors) to 1.0
  * (neighborhood is a clique).
- * <br/>
+ * <p>
  * An edge between a vertex's neighbors is a triangle. Counting edges between
  * neighbors is equivalent to counting the number of triangles which include
  * the vertex.
- * <br/>
+ * <p>
  * The input graph must be a simple, undirected graph containing no duplicate
  * edges or self-loops.
  *

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriadicCensus.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriadicCensus.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriadicCensus.java
index 3f59d0e..7482af0 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriadicCensus.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriadicCensus.java
@@ -38,10 +38,10 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
 /**
  * A triad is formed by three connected or unconnected vertices in a graph.
  * The triadic census counts the occurrences of each type of triad.
- * <br/>
+ * <p>
  * The four types of undirected triads are formed with 0, 1, 2, or 3
  * connecting edges.
- * <br/>
+ * <p>
  * http://vlado.fmf.uni-lj.si/pub/networks/doc/triads/triads.pdf
  *
  * @param <K> graph ID type

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriangleListing.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriangleListing.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriangleListing.java
index 8850125..09b9a5d 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriangleListing.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/clustering/undirected/TriangleListing.java
@@ -48,15 +48,16 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
 
 /**
  * Generates a listing of distinct triangles from the input graph.
- * <br/>
+ * <p>
  * A triangle is a 3-cycle with vertices A, B, and C connected by edges
  * (A, B), (A, C), and (B, C).
- * <br/>
+ * <p>
  * The input graph must be a simple, undirected graph containing no duplicate
  * edges or self-loops.
- * <br/>
- * Algorithm from "Graph Twiddling in a MapReduce World", J. D. Cohen,
- * http://lintool.github.io/UMD-courses/bigdata-2015-Spring/content/Cohen_2009.pdf
+ * <p>
+ * Algorithm from "Finding, Counting and Listing all Triangles in Large Graphs,
+ * An Experimental Study", Thomas Schank and Dorothea Wagner.
+ * http://i11www.iti.uni-karlsruhe.de/extra/publications/sw-fclt-05_t.pdf
  *
  * @param <K> graph ID type
  * @param <VV> vertex value type
@@ -178,7 +179,7 @@ extends GraphAlgorithmWrappingDataSet<K, VV, EV, Tuple3<K, K, K>> {
 	/**
 	 * Removes edge values while filtering such that only edges where the
 	 * source vertex ID compares less than the target vertex ID are emitted.
-	 * <br/>
+	 * <p>
 	 * Since the input graph is a simple graph this filter removes exactly half
 	 * of the original edges.
 	 *
@@ -206,7 +207,7 @@ extends GraphAlgorithmWrappingDataSet<K, VV, EV, Tuple3<K, K, K>> {
 	 * vertex has lower degree are emitted. If the source and target vertex
 	 * degrees are equal then the edge is emitted if the source vertex ID
 	 * compares less than the target vertex ID.
-	 * <br/>
+	 * <p>
 	 * Since the input graph is a simple graph this filter removes exactly half
 	 * of the original edges.
 	 *

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/link_analysis/HITS.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/link_analysis/HITS.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/link_analysis/HITS.java
index ecc1ad7..f4195f7 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/link_analysis/HITS.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/link_analysis/HITS.java
@@ -53,9 +53,11 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
  * Hyperlink-Induced Topic Search computes two interdependent scores for every
  * vertex in a directed graph. A good "hub" links to good "authorities" and
  * good "authorities" are linked from good "hubs".
- *
+ * <p>
  * This algorithm can be configured to terminate either by a limit on the number
  * of iterations, a convergence threshold, or both.
+ * <p>
+ * http://www.cs.cornell.edu/home/kleinber/auth.pdf
  *
  * http://www.cs.cornell.edu/home/kleinber/auth.pdf
  *

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/AdamicAdar.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/AdamicAdar.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/AdamicAdar.java
index 00819e4..a7ba00a 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/AdamicAdar.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/AdamicAdar.java
@@ -53,16 +53,16 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
 
 /**
  * http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
- * <br/>
+ * <p>
  * Adamic-Adar measures the similarity between pairs of vertices as the sum of
  * the inverse logarithm of degree over shared neighbors. Scores are non-negative
  * and unbounded. A vertex with higher degree has greater overall influence but
  * is less influential to each pair of neighbors.
- * <br/>
+ * <p>
  * This implementation produces similarity scores for each pair of vertices
  * in the graph with at least one shared neighbor; equivalently, this is the
  * set of all non-zero Adamic-Adar coefficients.
- * <br/>
+ * <p>
  * The input graph must be a simple, undirected graph containing no duplicate
  * edges or self-loops.
  *

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java
index 148d541..11ec73d 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/library/similarity/JaccardIndex.java
@@ -48,11 +48,11 @@ import static org.apache.flink.api.common.ExecutionConfig.PARALLELISM_DEFAULT;
  * is computed as the number of shared neighbors divided by the number of
  * distinct neighbors. Scores range from 0.0 (no shared neighbors) to 1.0 (all
  * neighbors are shared).
- * <br/>
+ * <p>
  * This implementation produces similarity scores for each pair of vertices
  * in the graph with at least one shared neighbor; equivalently, this is the
  * set of all non-zero Jaccard Similarity coefficients.
- * <br/>
+ * <p>
  * The input graph must be a simple, undirected graph containing no duplicate
  * edges or self-loops.
  *
@@ -240,10 +240,10 @@ extends GraphAlgorithmWrappingDataSet<K, VV, EV, Result<K>> {
 	 * This is the first of three operations implementing a self-join to generate
 	 * the full neighbor pairing for each vertex. The number of neighbor pairs
 	 * is (n choose 2) which is quadratic in the vertex degree.
-	 * <br/>
+	 * <p>
 	 * The third operation, {@link GenerateGroupPairs}, processes groups of size
 	 * {@link #groupSize} and emits {@code O(groupSize * deg(vertex))} pairs.
-	 * <br/>
+	 * <p>
 	 * This input to the third operation is still quadratic in the vertex degree.
 	 * Two prior operations, {@link GenerateGroupSpans} and {@link GenerateGroups},
 	 * each emit datasets linear in the vertex degree, with a forced rebalance
@@ -318,7 +318,7 @@ extends GraphAlgorithmWrappingDataSet<K, VV, EV, Result<K>> {
 
 	/**
 	 * Emits the two-path for all neighbor pairs in this group.
-	 * <br/>
+	 * <p>
 	 * The first {@link #groupSize} vertices are emitted pairwise. Following
 	 * vertices are only paired with vertices from this initial group.
 	 *

http://git-wip-us.apache.org/repos/asf/flink/blob/cb9e409b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/utils/proxy/OptionalBoolean.java
----------------------------------------------------------------------
diff --git a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/utils/proxy/OptionalBoolean.java b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/utils/proxy/OptionalBoolean.java
index 70a1294..7a7208a 100644
--- a/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/utils/proxy/OptionalBoolean.java
+++ b/flink-libraries/flink-gelly/src/main/java/org/apache/flink/graph/utils/proxy/OptionalBoolean.java
@@ -22,7 +22,7 @@ import org.apache.flink.graph.GraphAlgorithm;
 
 /**
  * A multi-state boolean.
- * <br/>
+ * <p>
  * This class is used by {@link GraphAlgorithm} configuration options to set a
  * default value which can be overwritten. The default value is also used when
  * algorithm configurations are merged and conflict.