You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by "Thomas Jungblut (JIRA)" <ji...@apache.org> on 2012/06/18 11:01:46 UTC

[jira] [Created] (HAMA-593) Improve RPC scalability

Thomas Jungblut created HAMA-593:
------------------------------------

             Summary: Improve RPC scalability
                 Key: HAMA-593
                 URL: https://issues.apache.org/jira/browse/HAMA-593
             Project: Hama
          Issue Type: Sub-task
          Components: bsp core, messaging
    Affects Versions: 0.5.0
            Reporter: Thomas Jungblut


To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HAMA-593) Improve RPC scalability

Posted by "Mayank Mishra (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Mishra updated HAMA-593:
-------------------------------

    Attachment: HAMA-593_2.patch

Got a day off for celebrating Independence Day. :) Adding the patch having LRU Cache support. We are closing the connections when element gets evicted from the cache. Please review it.
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>         Attachments: HAMA-593_1.patch, HAMA-593_2.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433139#comment-13433139 ] 

Thomas Jungblut edited comment on HAMA-593 at 8/14/12 12:50 AM:
----------------------------------------------------------------

We could also use the CacheBuilders from Guava. 

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html

Guava should be in our classpath.

bq. Sorry but I didn't got clues of using just LinkedHashSet for conceptualizing LRU cache.

When sending stuff this is a sequential operation, so caching the connections must not be synchronized. So no need to use synchronized structures. Simply subclassing LinkedHashMap and overriding removeEldestEntry should be enough.

See:
https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/datastructure/LRUCache.java
                
      was (Author: thomas.jungblut):
    We could also use the CacheBuilders from Guava. 

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html

Guava should be in our classpath.

bq. Sorry but I didn't got clues of using just LinkedHashSet for conceptualizing LRU cache.

When sending stuff this is a sequential operation, so caching the connections must not be synchronized. So no need to use synchronized structures. Simply subclassing LinkedHashMap and overriding removeEldestEntry should be enough.
                  
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Mayank Mishra (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433135#comment-13433135 ] 

Mayank Mishra commented on HAMA-593:
------------------------------------

Yes, I agree with the intend of using a LRU with an upper capacity. But, don't you think that rather than going with a synchronized+LinkedHashMap, we should go with LRUCache by extending ConcurrentHashMap, this way as the concurrent usage increase we should be able to get good performance. Sorry but I didn't got clues of using just LinkedHashSet for conceptualizing LRU cache.

                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435840#comment-13435840 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

Oh sorry, don't wanted to disturb your holiday!
Will review it when I'm back from work. Thanks!
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>         Attachments: HAMA-593_1.patch, HAMA-593_2.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436093#comment-13436093 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

Build is fine for me, I just made a constant key for configuration and mapped it into the default conf. Also I moved the LRU Cache to our utils, because I'm sure it can be used more throughout the project.

Thank you very much for that contribution. I will open a follow-up for the other scalability problem.
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>             Fix For: 0.6.0
>
>         Attachments: HAMA-593_1.patch, HAMA-593_2.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435991#comment-13435991 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

Looks good to me, I think we should move the constant to a higher abstraction level and make it configurable. But that is just a minor thing I will do.

Thanks!
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>         Attachments: HAMA-593_1.patch, HAMA-593_2.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433139#comment-13433139 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

We could also use the CacheBuilders from Guava. 

http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html

Guava should be in our classpath.

bq. Sorry but I didn't got clues of using just LinkedHashSet for conceptualizing LRU cache.

When sending stuff this is a sequential operation, so caching the connections must not be synchronized. So no need to use synchronized structures. Simply subclassing LinkedHashMap and overriding removeEldestEntry should be enough.
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Mayank Mishra (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431624#comment-13431624 ] 

Mayank Mishra commented on HAMA-593:
------------------------------------

Can I get some more description on this issue. Are we talking about getting BSPPeerConnections (i.e. HadoopMessageManager) one after another on request basis or about Avro's Message Bundle transmission over NettyTransceiver?
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Jungblut resolved HAMA-593.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 0.6.0
    
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>             Fix For: 0.6.0
>
>         Attachments: HAMA-593_1.patch, HAMA-593_2.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13433277#comment-13433277 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

Do you want to add this caching behaviour? 
Patch otherwise looks good to me. 

We have to add a follow up issue for the other scalability bottleneck.
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435267#comment-13435267 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

Feel free to add another patch then, if not I will commit this on saturday and add the caching.
Thanks!
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431653#comment-13431653 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

In both of the managers (Hadoop|Avro) the RPC connections to the other peers are cached in a map called "peers".

This will always leave the connection open to every other task, imagine you have 1k tasks, each task will hold 1k connections to each other (999 outband, 1 local).

1. The caching must be removed and the connections must be closed when the messages were send.

Then, there is a problem when all 1k peers would attempt to send to a single peer (let's say a master task in a graph algorithm that aggregates). In this case the peer will start 1k-threads which is using enourmous amount of memory. I wouldn't mind if this can be done smarter, but I have no solution at hand currently.

Would be cool if 1. could be resolved :)
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HAMA-593) Improve RPC scalability

Posted by "Edward J. Yoon (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon reassigned HAMA-593:
-----------------------------------

    Assignee: Mayank Mishra
    
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13436397#comment-13436397 ] 

Hudson commented on HAMA-593:
-----------------------------

Integrated in Hama-Nightly #644 (See [https://builds.apache.org/job/Hama-Nightly/644/])
    [HAMA-593]: Improve RPC scalability (Mayank Mishra via tjungblut) (Revision 1373915)

     Result = FAILURE
tjungblut : 
Files : 
* /hama/trunk/CHANGES.txt
* /hama/trunk/conf/hama-default.xml
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/AbstractMessageManager.java
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/AvroMessageManagerImpl.java
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/CompressableMessageManager.java
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/HadoopMessageManagerImpl.java
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/MemoryQueue.java
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/MessageManager.java
* /hama/trunk/core/src/main/java/org/apache/hama/bsp/message/Sender.java

                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>            Assignee: Mayank Mishra
>             Fix For: 0.6.0
>
>         Attachments: HAMA-593_1.patch, HAMA-593_2.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HAMA-593) Improve RPC scalability

Posted by "Thomas Jungblut (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13432925#comment-13432925 ] 

Thomas Jungblut commented on HAMA-593:
--------------------------------------

Hi will review this soon.

Do you think it may be beneficial to configure a number of tasks (at max) where the sockets are cached?
Maybe a simple LRU cache by a LinkedHashSet.

                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HAMA-593) Improve RPC scalability

Posted by "Mayank Mishra (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HAMA-593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mayank Mishra updated HAMA-593:
-------------------------------

    Attachment: HAMA-593_1.patch

Patch for HAMA-593. Removed caching of RPC connections for both Hadoop|Avro Managers, connections are created and closed once transfer to peer happens.
                
> Improve RPC scalability
> -----------------------
>
>                 Key: HAMA-593
>                 URL: https://issues.apache.org/jira/browse/HAMA-593
>             Project: Hama
>          Issue Type: Sub-task
>          Components: bsp core, messaging
>    Affects Versions: 0.5.0
>            Reporter: Thomas Jungblut
>         Attachments: HAMA-593_1.patch
>
>
> To improve scalability we can start a RPC connection after another instead of keeping all possible N connections open the whole time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira