You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/02/09 16:29:42 UTC

[jira] [Commented] (FLINK-4896) PageRank algorithm for directed graphs

    [ https://issues.apache.org/jira/browse/FLINK-4896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859749#comment-15859749 ] 

ASF GitHub Bot commented on FLINK-4896:
---------------------------------------

Github user greghogan commented on the issue:

    https://github.com/apache/flink/pull/2733
  
    Running on a c4.xlarge with 4 slots and a 4 GB preallocated TaskManager heap. EdgeList measures the time to simplify the graph since the library PageRank using Scatter-Gather ("PageRankSG") requires each vertex to have both incoming and outgoing edges. "PageRank" is the algorithm from this PR.
    
    Algorithm | Scale 16 | Scale 18
    ------------ | ------------- | -------------
    EdgeList | 2537 ms | 8779 ms
    PageRank | 9563 ms | 39558 ms
    PageRankSG | 11188 ms | 47736 ms


> PageRank algorithm for directed graphs
> --------------------------------------
>
>                 Key: FLINK-4896
>                 URL: https://issues.apache.org/jira/browse/FLINK-4896
>             Project: Flink
>          Issue Type: New Feature
>          Components: Gelly
>    Affects Versions: 1.2.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>
> Gelly includes PageRank implementations for scatter-gather and gather-sum-apply. Both ship with the warning "The implementation assumes that each page has at least one incoming and one outgoing link."
> PageRank is a directed algorithm and sources and sinks are common in directed graphs.
> Sinks drain the total score across the graph which affects convergence and the balance of the random hop (convergence is not currently a feature of Gelly's PageRanks as this a very recent feature from FLINK-3888).
> Sources are handled nicely by the algorithm highlighted on Flink's features page under "Iterations and Delta Iterations" since score deltas are transmitted and a source's score never changes (is always equal to the random hop probability divided by the vertex count).
>   https://flink.apache.org/features.html
> We should find an implementation featuring convergence and unrestricted processing of directed graphs and move other implementations to Gelly examples.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)