You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Thomas Jungblut (JIRA)" <ji...@apache.org> on 2012/11/05 19:30:11 UTC

[jira] [Commented] (HAMA-629) Improve RPC Scalability Part 2

    [ https://issues.apache.org/jira/browse/HAMA-629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13490796#comment-13490796 ] 

Thomas Jungblut commented on HAMA-629:
--------------------------------------

I'm reading into Erlang these days and they deal with this problem very interestingly (actually very simple and naive).
They are building a hierachy of nodes in your cluster. A subcluster contains 10 nodes (maybe the number of nodes in a rack?), they are completely interconnected with each other, so the are holding a RPC or socket connection the whole time.

If messages need to be send outside the subcluster, they are sending it to an aggregator (or multiple aggregators) and the aggregators will send to the real destination.

That's how they archieve scalability, certainly a lot of TCP connections are a big problem there as well.
                
> Improve RPC Scalability Part 2
> ------------------------------
>
>                 Key: HAMA-629
>                 URL: https://issues.apache.org/jira/browse/HAMA-629
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>
> There is a problem when all 1k peers would attempt to send to a single peer (let's say a master task in a graph algorithm that aggregates). In this case the peer will start 1k-threads which is using enourmous amount of memory. 
> I think we can coordinate the message sending either with Zookeeper or by using the task id and do a smarter sending chain.
> By the last, I mean, that each task can start at a different offset in the peer array to start sending messages to the other peers. But this won't solve the problem DDoS'ing a single master task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira