You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2016/04/15 22:06:25 UTC

[jira] [Commented] (FLINK-3768) Clustering Coefficient

    [ https://issues.apache.org/jira/browse/FLINK-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243506#comment-15243506 ] 

ASF GitHub Bot commented on FLINK-3768:
---------------------------------------

GitHub user greghogan opened a pull request:

    https://github.com/apache/flink/pull/1896

    [FLINK-3768] [gelly] Clustering Coefficient

    Provides an algorithm for local clustering coefficient and dependent functions for degree annotation, algorithm caching, and graph translation.
    
    I worked to improve the performance of `TriangleEnumerator`. Perhaps the API has changed since `Edge.reverse()` is not in-place and the edges were not being sorted by degree. The `JoinHint` is also important so that the `Triad`s are not spilled to disk.
    
    On an AWS ec2.4xlarge (16 vcores, 30 GiB) I am seeing for the following timings of 5s, 29s, and 183s for `TriangleListing`. With `TriangleEnumerator` the timings are 7s, 45s, and 281s. Without the `JoinHint` the latter `TriangleEnumerator` timings are 58s and 347s.
    
    Scale | ChecksumHashCode | Count
    ------|----------------------------|----------
    16 | 0x0000d9086985f4ce | 15616010
    18 | 0x0010eeb32a441365 | 82781436
    20 | 0x014a9434bb57ddef | 423780284
    
    The command I had used to run the tests:
    ```
    ./bin/flink run -class org.apache.flink.graph.examples.TriangleListing ~/flink-gelly-examples_2.10-1.1-SNAPSHOT.jar --clip_and_flip false --output print --output hash --scale 16 --listing
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/greghogan/flink 3768_clustering_coefficient

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/1896.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1896
    
----
commit aa1141f4d34f7af9c092ec76bf1a81de310aed16
Author: Greg Hogan <co...@greghogan.com>
Date:   2016-04-13T13:28:38Z

    [FLINK-3768] [gelly] Clustering Coefficient
    
    Provides an algorithm for local clustering coefficient and dependent
    functions for degree annotation, algorithm caching, and graph translation.

----


> Clustering Coefficient
> ----------------------
>
>                 Key: FLINK-3768
>                 URL: https://issues.apache.org/jira/browse/FLINK-3768
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.1.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> The local clustering coefficient measures the connectedness of each vertex's neighborhood. Values range from 0.0 (no edges between neighbors) to 1.0 (neighborhood is a clique).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)