You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Eli Reisman (JIRA)" <ji...@apache.org> on 2012/09/08 19:19:07 UTC

[jira] [Commented] (GIRAPH-322) Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of control message growth in large scale jobs

    [ https://issues.apache.org/jira/browse/GIRAPH-322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13451370#comment-13451370 ] 

Eli Reisman commented on GIRAPH-322:
------------------------------------

When I run the instrumented copy, everything runs great and messages get where they are going until the very last flush of the leftovers at the end of super step 0 where we hit NettyWorkerClient#waitAllRequests. In here, I see a continual loop on all the flushing workers where they endlessly try to re-connect with their destinations. I am not seeing anything to indicate the destinations have crashed is the weird thing. I am very suspicious my request wiring is not quite right. Will take a deeper look in the next day or two. If anyone sees anything obvious, please let me know.

I will also attempt to try out the disk spill and tune it better, but I will be operating on a smaller setup now so I will need good results confirmed in the end by those better resourced than myself. ;) but thats a ways off still...

                
> Run Length Encoding for Vertex#sendMessageToAllEdges might curb out of control message growth in large scale jobs
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: GIRAPH-322
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-322
>             Project: Giraph
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.2.0
>            Reporter: Eli Reisman
>            Assignee: Eli Reisman
>            Priority: Minor
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-322-1.patch, GIRAPH-322-2.patch
>
>
> Vertex#sendMessageToAllEdges is a case that goes against the grain of the data structures and code paths used to transport messages through a Giraph application and out on the network. While messages to a single vertex can be combined (and should be) in some applications that could make use of this broadcast messaging, the out of control message growth of algorithms like triangle closing means we need to de-duplicate messages bound for many vertices/partitions.
> This will be an evolving solution (this first patch is just the first step) and currently it does not present a robust solution for disk-spill message stores. I figure I can get some advice about that or it can be a follow-up JIRA if this turns out to be a fruitful pursuit. This first patch is also Netty-only and simply defaults to the old sendMessagesToAllEdges() implementation if USE_NETTY is false. All this can be cleaned up when we know this works and/or is worth pursuing.
> The idea is to send as few broadcast messages as possible by run-length encoding their delivery and only duplicating message on the network when they are bound for different partitions. This is also best when combined with "-Dhash.userPartitionCount=# of workers" so you don't do too much of that.
> If this shows promise I will report back and keep working on this. As it is, it represents an end-to-end solution, using Netty, for in-memory messaging. It won't break with spill to disk, but you do lose the de-duplicating effect.
> More to follow, comments/ideas welcome. I expect this to change a lot as I test it and ideas/suggestions crop up.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira