You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/04/15 22:06:25 UTC
[jira] [Commented] (FLINK-3768) Clustering Coefficient
[ https://issues.apache.org/jira/browse/FLINK-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243506#comment-15243506 ]
ASF GitHub Bot commented on FLINK-3768:
---------------------------------------
GitHub user greghogan opened a pull request:
https://github.com/apache/flink/pull/1896
[FLINK-3768] [gelly] Clustering Coefficient
Provides an algorithm for local clustering coefficient and dependent functions for degree annotation, algorithm caching, and graph translation.
I worked to improve the performance of `TriangleEnumerator`. Perhaps the API has changed since `Edge.reverse()` is not in-place and the edges were not being sorted by degree. The `JoinHint` is also important so that the `Triad`s are not spilled to disk.
On an AWS ec2.4xlarge (16 vcores, 30 GiB) I am seeing for the following timings of 5s, 29s, and 183s for `TriangleListing`. With `TriangleEnumerator` the timings are 7s, 45s, and 281s. Without the `JoinHint` the latter `TriangleEnumerator` timings are 58s and 347s.
Scale | ChecksumHashCode | Count
------|----------------------------|----------
16 | 0x0000d9086985f4ce | 15616010
18 | 0x0010eeb32a441365 | 82781436
20 | 0x014a9434bb57ddef | 423780284
The command I had used to run the tests:
```
./bin/flink run -class org.apache.flink.graph.examples.TriangleListing ~/flink-gelly-examples_2.10-1.1-SNAPSHOT.jar --clip_and_flip false --output print --output hash --scale 16 --listing
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/greghogan/flink 3768_clustering_coefficient
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/1896.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #1896
----
commit aa1141f4d34f7af9c092ec76bf1a81de310aed16
Author: Greg Hogan <co...@greghogan.com>
Date: 2016-04-13T13:28:38Z
[FLINK-3768] [gelly] Clustering Coefficient
Provides an algorithm for local clustering coefficient and dependent
functions for degree annotation, algorithm caching, and graph translation.
----
> Clustering Coefficient
> ----------------------
>
> Key: FLINK-3768
> URL: https://issues.apache.org/jira/browse/FLINK-3768
> Project: Flink
> Issue Type: New Feature
> Components: Gelly
> Affects Versions: 1.1.0
> Reporter: Greg Hogan
> Assignee: Greg Hogan
>
> The local clustering coefficient measures the connectedness of each vertex's neighborhood. Values range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)