You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Roman Shaposhnik (JIRA)" <ji...@apache.org> on 2014/06/07 00:10:03 UTC

[jira] [Updated] (GIRAPH-800) Resolving mutations on a large graph causes timeouts

     [ https://issues.apache.org/jira/browse/GIRAPH-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Roman Shaposhnik updated GIRAPH-800:
------------------------------------

    Fix Version/s:     (was: 1.1.0)

> Resolving mutations on a large graph causes timeouts
> ----------------------------------------------------
>
>                 Key: GIRAPH-800
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-800
>             Project: Giraph
>          Issue Type: Bug
>          Components: graph
>    Affects Versions: 1.1.0
>         Environment: hadoop1
>            Reporter: Craig Muchinsky
>         Attachments: GIRAPH-800.patch
>
>
> When processing a graph with a large number of mutations and/or a large number of messages per superstep, the pre-superstep logic can appear to be hung up and eventually the graph times out either because of mapreduce task inactivity or hitting the max superstep wait.
> While its possible to tune around this by adding a strategic call to context.progress() in NettyServerWorker.resolveMutations() and bumping up the giraph.maxMasterSuperstepWaitMsecs setting, it would seem this part of the code might need some optimization.
> As an example, in a graph with 2B vertices and 2.5B edges the transition between supersteps with 1B messages in flight can take 15-30 minutes on a cluster with 228 workers (2 threads, 8GB RAM per worker).
> While the vertex resolve processing can be time consuming, I believe its the check for missing vertices (second loop within NettyServerWorker.resolveMutations()) that is the real performance bottleneck. I haven't identified a fix to this logic as of yet, but I did identify a possible workaround. I believe when dealing with a static and complete graph the resolveMutations() call can be skipped all together. A quick test of this theory yielded a 3x performance improvement in my sandbox.



--
This message was sent by Atlassian JIRA
(v6.2#6252)