You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Jakob Homan (JIRA)" <ji...@apache.org> on 2011/09/17 12:11:08 UTC

[jira] [Created] (GIRAPH-37) Implement Netty-backed rpc solution

Implement Netty-backed rpc solution
-----------------------------------

                 Key: GIRAPH-37
                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
             Project: Giraph
          Issue Type: New Feature
            Reporter: Jakob Homan
            Assignee: Jakob Homan


GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128496#comment-13128496 ] 

Dmitriy V. Ryaboy commented on GIRAPH-37:
-----------------------------------------

Avery, I don't think Finagle's issue 40 is really relevant -- it's just a question about whether it's possible to have a finagle-based project without building off Twitter's scala project scaffolding called sbt-project (and the answer is yes, it's pretty simple).

I'll point Marius at this Jira, perhaps he'll address some of Jacob's concerns.

As far as publishing the compiled code -- that's actually pretty handy (not having to regenerate the stuff all the time). Maybe we could publish a "giraph-rpc" artifact and make main giraph rely on that? Then most developers (ones who don't care about RPC) won't be exposed to this at all, *and* they'll save compilation time by just having maven grab the pre-generated jar.
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Bo Wang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269304#comment-13269304 ] 

Bo Wang commented on GIRAPH-37:
-------------------------------

This is really good news. RPC doesn't seems very scalable. Look forward to the Netty implementation.
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107334#comment-13107334 ] 

Jake Mannix commented on GIRAPH-37:
-----------------------------------

Cool, you planning on trying Finagle?  It seems like it could save a lot of work in comparison to doing something totally custom on top of Netty (maven repo here: http://maven.twttr.com/com/twitter/finagle/1.9.0/ for the "whole thing", or smaller slices, like finagle-thrift, here: http://maven.twttr.com/com/twitter/finagle-thrift/1.9.0/ ).

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Hyunsik Choi (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13109682#comment-13109682 ] 

Hyunsik Choi commented on GIRAPH-37:
------------------------------------

Two weeks ago, in GIRAPH-12 I said that I had tested rpc system based on protobuf and netty. I said that I need more time and I would upload the progress. The below link is my ongoing work.

https://github.com/hyunsik/giraph-rpc

This is not completed. It needs more tests and more features like hadoop security, and it needs to handle exceptions well. However, I think that it has the basic features. Since you seem to start this issue, I don't proceed this work. I just hope the implementation would be a bit of help to your work :)

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128526#comment-13128526 ] 

Dmitriy V. Ryaboy commented on GIRAPH-37:
-----------------------------------------

Jacob, I haven't worked through the actual communication logic, but thought I'd drop in a few minor code style comments (this is just based on reading, didn't try running it, busy week):

First, I would suggest namespacing the autogen stuff into a separate package (o.a.g.comm.finaglerpc.gen ?). That would make it easier to see what's our code and what is auto-generated.

The strings "giraph.rpc.impl" and "giraph.rpc.finagle.debug" should be public static final and documented.  In fact we might want to start GiraphConfigStrings or something as a central place to keep these, instead of spreading them in undiscoverable hard-coded strings all across the codebase a-la hadoop..  Same for static counter names like "FinagleRPC stats".

What do you mean by (Synchronized) in the documentation for FinagleRPCCommunications.transientInMessages and inVertexRangeMap? The maps that you are using are not inherently synchronized.  Do you want to init them with something like Collections.synchronizedMap(new HashMap()) ? Or did I miss your meaning?  Also, transientFoo is a bit confusing given that Java has a transient keyword and Foos aren't transient in that sense. inflightInMessages? 

You specify things in comments like nullable and visible for testing that are available as annotations in Guava, which is nicer.. not reason enough to bring in yet enough dependency, but something to keep in mind. 

MAX_VERTICES_PER_RPC -- we are going to want that to be configurable, right?

In flush(), you are doing a lot of string concatenation in a peer-sized loop. Some of that is redundant (all but the name of the peer), and all of it would be better off with a StringBuilder.

Protocol Version -- why not? Granted, thrift is mostly backwards-compatible, but it's a bit of safety that won't hurt. Could just add it to the thrift struct.


                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271236#comment-13271236 ] 

jiraposter@reviews.apache.org commented on GIRAPH-37:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5074/
-----------------------------------------------------------

Review request for giraph.


Summary
-------

* Implemented a request/response protocol with netty as a NettyClient and NettyServer.  There is a NettyClientWorker and NettyClientServer that implements WorkerClient and WorkerServer, respectively.  Netty is a lot faster since it's non-blocking and we can interleave computation and communication as opposed to Hadoop RPC (blocking).
* The netty server implementation uses concurrent hash maps to improved concurrency instead of synchronized blocks around maps.
* By default netty is used, but Hadoop RPC can be used with -Dgiraph.useNetty=false
* Changed the class hierarchy of ServerInterface to WorkerClientServer (WorkerClient and WorkerServer) to support a request/response protocol instead of just RPC
* In netty, the messages/mutations are gathered by partition and send out as a partition's worth of messages/mutations
* Added two new test classes (RequestTest.java and ConnectionTest.java) to test all requests and check netty connections.
* PageRankBenchmark uses EdgeListVertex as a default


This addresses bug GIRAPH-37.
    https://issues.apache.org/jira/browse/GIRAPH-37


Diffs
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClientServer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClient.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMutationsCache.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMessageCache.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestRegistry.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestEncoder.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestDecoder.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1332888 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1332888 

Diff: https://reviews.apache.org/r/5074/diff


Testing
-------

'mvn verify' passes.  I ran several test runs to gather performance results.  Here is a simple example:

Hadoop RPC:
hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=false -w 5 -V 5000000 -s 5 -e 2 -v

12/05/09 01:59:56 INFO mapred.JobClient:   Giraph Timers
12/05/09 01:59:56 INFO mapred.JobClient:     Total (milliseconds)=167722
12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 3 (milliseconds)=24775
12/05/09 01:59:56 INFO mapred.JobClient:     Setup (milliseconds)=2930
12/05/09 01:59:56 INFO mapred.JobClient:     Shutdown (milliseconds)=181
12/05/09 01:59:56 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=51025
12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 0 (milliseconds)=21543
12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 4 (milliseconds)=19858
12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 5 (milliseconds)=2844
12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 2 (milliseconds)=24507
12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 1 (milliseconds)=20054

Netty:
hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -w 5 -V 5000000 -s 5 -e 2 -v

12/05/09 02:06:10 INFO mapred.JobClient:   Giraph Timers
12/05/09 02:06:10 INFO mapred.JobClient:     Total (milliseconds)=57795
12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 3 (milliseconds)=7636
12/05/09 02:06:10 INFO mapred.JobClient:     Setup (milliseconds)=3574
12/05/09 02:06:10 INFO mapred.JobClient:     Shutdown (milliseconds)=232
12/05/09 02:06:10 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=13393
12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 0 (milliseconds)=5610
12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 4 (milliseconds)=8473
12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 5 (milliseconds)=1844
12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 2 (milliseconds)=7418
12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 1 (milliseconds)=9612

These were some median runs. The overall runtime improved from 167722 -> 57795 with Netty (2.9x faster).  Loading the vertices improved from 51025 -> 13393 (3.8x faster).  More results coming tomorrow, but for bigger runs, the improvement is likely to be even more than 3x.


Thanks,

Avery


                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Jake Mannix (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107165#comment-13107165 ] 

Jake Mannix commented on GIRAPH-37:
-----------------------------------

We should make sure we don't all work on the same thing (note the discussion at the end of GIRAPH-12) - two at a time might be fine, but half of the developers all on RPC might be excessive.  Do you want to take this one?  I was going to go in and try and implement a Finagle-based solution, as it's already an async RPC-system on top of Netty, but if you're already going to look at this, I can drop what I was doing and work on something else.

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Claudio Martella (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271242#comment-13271242 ] 

Claudio Martella commented on GIRAPH-37:
----------------------------------------

Hi Avery,

this is super impressive, both in terms of architecture change and benchmarking results. 
Congratulations on the great work. I particularly welcome the per-partition inbox, which is something I was also requiring for out-of-core-messages.

I'll try to review this as soon as possible, which is probably going to be during the weekend.
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Avery Ching (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128357#comment-13128357 ] 

Avery Ching commented on GIRAPH-37:
-----------------------------------

Jakob, this is a big patch, thanks for taking the time to try out Finagle and for the status update.  I'm hoping that Jake and/or Dmitriy will have some time and insight on dealing with some of the issues you've encountered.  In particular, the issues are a little worrisome is the reliability (2/3 of the runs only) and the compiled code with a forked version of the thrift compiler.

Note sure what this means:

https://github.com/twitter/finagle/issues/40

but it would be nice to not have to keep the generated code checked in and generated the compiled source on the fly with a maven plugin.

Also, how were the performance numbers in comparison to the current approach?


                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107429#comment-13107429 ] 

Avery Ching commented on GIRAPH-37:
-----------------------------------

I don't know what Jakob has planned for security, but thanks for raising that point.  I have mixed feelings about security.  On the one hand it's a nice feature to have for clients that require it and wouldn't be able to use Giraph without it.  On the other hand, it is the sole reason for the nasty preprocessor code (munge) in Giraph (not because of security per se but rather about how security is implemented and not backwards compatible).  It would be great if security were optional without preprocessor code in Giraph.  I'm sure Jakob and other Giraph devs have an opinion as well and will chime in.  

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13269303#comment-13269303 ] 

Avery Ching commented on GIRAPH-37:
-----------------------------------

Since Jakob had to switch gears, I wanted to let you guys know that I've spent a few days of the past week working on a netty-only replacement for communication.  I should have a patch and some performance numbers up in a few days.  Users will be able to choose between the old RPC way and the this netty approach.  Netty is so customizable, it will likely taking a lot of tuning to get the dials right for most cases.
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Brian Femiano (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271566#comment-13271566 ] 

Brian Femiano commented on GIRAPH-37:
-------------------------------------

Very cool. I'll test this on some of my 100m vertices datasets on EC2. The previous RPC method had massive issues
communicating across many workers. Anxious to see this improvement. 
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Vinod Kumar Vavilapalli (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107417#comment-13107417 ] 

Vinod Kumar Vavilapalli commented on GIRAPH-37:
-----------------------------------------------

Have you thought about security? In the README, I found that Giraph works with a secure hadoop installation. I checked the code too (_RPCCommunications_ & related classes) and figured that on each worker, both the RPC server and the clients to other peers use Hadoop Mapreduce JobToken for authentication of connections. If security is a strong requirement for some folks (Y! ?), you should also think about that when you replace Hadoop RPC with Netty based solution.

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Claudio Martella (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271283#comment-13271283 ] 

Claudio Martella commented on GIRAPH-37:
----------------------------------------

I haven't gone through the code, so maybe it's obvious there, but why would rpc solution change the runtime of vertex input superstep and of setup?
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107308#comment-13107308 ] 

Jakob Homan commented on GIRAPH-37:
-----------------------------------

yeah, if no one else has started this, I'd like to begin.  Seeing as 12 didn't end with this solution, I started playing around on the flight back from London and plan on working on this this week, now that my vacation is over.  It's a blocker for some things we're trying to do with Giraph at the moment.

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271550#comment-13271550 ] 

jiraposter@reviews.apache.org commented on GIRAPH-37:
-----------------------------------------------------



bq.  On 2012-05-09 10:10:46, Sebastian Schelter wrote:
bq.  > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java, line 1465
bq.  > <https://reviews.apache.org/r/5074/diff/2/?file=108120#file108120line1465>
bq.  >
bq.  >     I don't like it that a collection is changed outside of the class that owns it. 
bq.  >     
bq.  >     This makes code hard to read and debug. We should rather introduce a method for this in the class that owns this map to have all mutations in one place.

Good point, it's a little heard to understand.  Since this is a Map, we can do as you suggested, keep it in a class and then add a method to do the clear().  We can even add calls to do the methods that iterate over the map as well to not have to do any synchronization outside of the map.  I'll do this for all our synchronized objects in the next patch if that's okay with you (the current code does this as well).  It will be a somewhat medium sized change.


- Avery


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5074/#review7728
-----------------------------------------------------------


On 2012-05-09 09:22:36, Avery Ching wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/5074/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-05-09 09:22:36)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  * Implemented a request/response protocol with netty as a NettyClient and NettyServer.  There is a NettyClientWorker and NettyClientServer that implements WorkerClient and WorkerServer, respectively.  Netty is a lot faster since it's non-blocking and we can interleave computation and communication as opposed to Hadoop RPC (blocking).
bq.  * The netty server implementation uses concurrent hash maps to improved concurrency instead of synchronized blocks around maps.
bq.  * By default netty is used, but Hadoop RPC can be used with -Dgiraph.useNetty=false
bq.  * Changed the class hierarchy of ServerInterface to WorkerClientServer (WorkerClient and WorkerServer) to support a request/response protocol instead of just RPC
bq.  * In netty, the messages/mutations are gathered by partition and send out as a partition's worth of messages/mutations
bq.  * Added two new test classes (RequestTest.java and ConnectionTest.java) to test all requests and check netty connections.
bq.  * PageRankBenchmark uses EdgeListVertex as a default
bq.  
bq.  
bq.  This addresses bug GIRAPH-37.
bq.      https://issues.apache.org/jira/browse/GIRAPH-37
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClientServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClient.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMutationsCache.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMessageCache.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestRegistry.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestEncoder.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestDecoder.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1332888 
bq.  
bq.  Diff: https://reviews.apache.org/r/5074/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  'mvn verify' passes.  I ran several test runs to gather performance results.  Here is a simple example:
bq.  
bq.  Hadoop RPC:
bq.  hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=false -w 5 -V 5000000 -s 5 -e 2 -v
bq.  
bq.  12/05/09 01:59:56 INFO mapred.JobClient:   Giraph Timers
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Total (milliseconds)=167722
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 3 (milliseconds)=24775
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Setup (milliseconds)=2930
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Shutdown (milliseconds)=181
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=51025
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 0 (milliseconds)=21543
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 4 (milliseconds)=19858
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 5 (milliseconds)=2844
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 2 (milliseconds)=24507
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 1 (milliseconds)=20054
bq.  
bq.  Netty:
bq.  hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -w 5 -V 5000000 -s 5 -e 2 -v
bq.  
bq.  12/05/09 02:06:10 INFO mapred.JobClient:   Giraph Timers
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Total (milliseconds)=57795
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 3 (milliseconds)=7636
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Setup (milliseconds)=3574
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Shutdown (milliseconds)=232
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=13393
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 0 (milliseconds)=5610
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 4 (milliseconds)=8473
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 5 (milliseconds)=1844
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 2 (milliseconds)=7418
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 1 (milliseconds)=9612
bq.  
bq.  These were some median runs. The overall runtime improved from 167722 -> 57795 with Netty (2.9x faster).  Loading the vertices improved from 51025 -bq.  13393 (3.8x faster).  More results coming tomorrow, but for bigger runs, the improvement is likely to be even more than 3x.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Avery
bq.  
bq.


                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Dmitriy V. Ryaboy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108141#comment-13108141 ] 

Dmitriy V. Ryaboy commented on GIRAPH-37:
-----------------------------------------

(moving my comment from email thread onto jira):

Note that finagle is not thrift specific. It's rpc protocol agnostic.
We can make a finagle-hadooprpc connector. Granted, the thrift
implementation is pretty hardened. Actually the fact that finagle is
independent of rpc frework may be another reason to use it -- flip
between hadooprpc and thrift depending on whether you want performance
or security.

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Jakob Homan (Updated) (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jakob Homan updated GIRAPH-37:
------------------------------

    Attachment: GIRAPH-37-wip.patch

Here's a work in progress patch for review and because I have to take next week to work on something else, so wanted to get it out before it went stale.  It uses Finagle with Thrift.  This experience was at first challenging due to Finagle ramp-up costs, then nice, now a challenging again due to stability issues.  95% of the size of the patch is generated thrift code; I'm not usually a fan on including generated code, but as explained below, this is a reasonable approach for Finagle.

The good:
* With this patch I can scale up to about 1k workers, although not reliably (see bad points)
* This approach moves us away from Hadoop RPC, which is good for the upcoming Yarn work and because Hadoop RPC itself is not ideal.
* Looking at what Hyunsik was having to go through when he was looking at Netty+PB, Finagle definitely saves quite a lot of work.
* This exercise has identified several improvements to the overall that need to be done.  I've opened GIRAPH-57, GIRAPH-55 and GIRAPH-54 for these.

The bad:
* The Thrift-Finagle combination uses a forked version of the thrift compiler to generate the interface Finagle expects.  Once up and running this is fine, but it means that we'd be dependent on this oddity.  Also, we'd need to include the generated code since it's too much to ask regular developers (not interested in the rpc) to download a new thrift compiler from github, compile it, keep it around, etc.
* There are quite a lot of knobs necessary to get a reliable run with a large number of mappers.  This is partially a fact of life of a distributed rpc and we can probably determine some of them programmatically, but at the moment, I can only get successful runs about 2/3 of the time.  The rest I get very difficult to decipher stack traces such as:
{noformat}
WARNING: An exception was thrown by a user handler while handling an exception event ([id: 0x4b7f1841, /172.18.67.79:46082 :> esv4-hcl227.corp.linkedin.com/172.18.66.182:30047] EXCEPTION: com.twitter.util.Promise$ImmutableResult: Result set multiple times: Throw(java.lang.RuntimeException: Hit exception in proxied call))
java.lang.RuntimeException: Hit exception in proxied call
	at org.apache.giraph.comm.finaglerpc.ThriftRPCProxyClient$CDLListener.onFailure(ThriftRPCProxyClient.java:91)
	at com.twitter.util.Future$$anonfun$addEventListener$1.apply(Future.scala:277)
	at com.twitter.util.Future$$anonfun$addEventListener$1.apply(Future.scala:276)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.util.Promise$$anon$2$$anonfun$8.apply(Future.scala:506)
	at com.twitter.util.Promise$$anon$2$$anonfun$8.apply(Future.scala:497)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.finagle.service.RetryingFilter$$anonfun$1.apply(RetryingFilter.scala:73)
	at com.twitter.finagle.service.RetryingFilter$$anonfun$1.apply(RetryingFilter.scala:56)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.util.Promise$$anon$2$$anonfun$8$$anonfun$apply$7.apply(Future.scala:502)
	at com.twitter.util.Promise$$anon$2$$anonfun$8$$anonfun$apply$7.apply(Future.scala:502)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.util.Promise$$anon$1$$anonfun$7.apply(Future.scala:491)
	at com.twitter.util.Promise$$anon$1$$anonfun$7.apply(Future.scala:490)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:471)
	at com.twitter.util.Promise$$anonfun$respond$1.apply(Future.scala:467)
	at com.twitter.concurrent.IVar.set(IVar.scala:50)
	at com.twitter.concurrent.IVar.set(IVar.scala:55)
	at com.twitter.util.Promise.updateIfEmpty(Future.scala:462)
	at com.twitter.util.Promise.update(Future.scala:450)
	at com.twitter.finagle.channel.ChannelService.com$twitter$finagle$channel$ChannelService$$reply(ChannelService.scala:51)
	at com.twitter.finagle.channel.ChannelService$$anon$1.exceptionCaught(ChannelService.scala:74)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:238)
	at com.twitter.finagle.thrift.ThriftFrameCodec.handleUpstream(ThriftFrameCodec.scala:11)
	at org.jboss.netty.channel.Channels.fireExceptionCaught(Channels.java:432)
	at org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:52)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
	at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:76)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:302)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:317)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:299)
	at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:216)
	at com.twitter.finagle.thrift.ThriftFrameCodec.handleUpstream(ThriftFrameCodec.scala:11)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:274)
	at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:261)
	at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:349)
	at org.jboss.netty.channel.socket.nio.NioWorker.processSelectedKeys(NioWorker.java:280)
	at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:200)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:619)
{noformat}
another one that happens quite a lot is {{Caused by: com.twitter.finagle.UnknownChannelException: com.twitter.util.Promise$ImmutableResult: Result set multiple times: Throw(java.lang.RuntimeException: Hit exception in proxied call)}}.  I think I need some aid from someone more experienced with Finagle, but I'm a bit nervous about the underlying framework being difficult to debug and configure.

Currently the patch passes all unit tests (and needs more for the finagle section itself).  Overall, I think the patch is worth pursuing and could be committed with the Hadoop RPC as the default RPC and the config/stability issues resolved in follow-up patches.  Perhaps it's just an issue of lousy configuration on my part.  Another option would be to look in a different direction, such as MessagePack.

Thoughts?
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-37) Implement Netty-backed IPC

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-37:
------------------------------

    Assignee: Avery Ching  (was: Jakob Homan)
     Summary: Implement Netty-backed IPC  (was: Implement Netty-backed rpc solution)
    
> Implement Netty-backed IPC
> --------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Avery Ching
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271270#comment-13271270 ] 

jiraposter@reviews.apache.org commented on GIRAPH-37:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5074/#review7728
-----------------------------------------------------------

Ship it!


I went through the code (although I don't have much experience with networking code), everything looks very well.

I tested this patch by computing the connected components of the undirected wikipedia pagelink graph (6M vertices, 250M edges) on a 6 machine cluster. Everything went fine and I even saw a small improvement in runtime although the job only takes 4 minutes.




http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/5074/#comment17027>

    I don't like it that a collection is changed outside of the class that owns it. 
    
    This makes code hard to read and debug. We should rather introduce a method for this in the class that owns this map to have all mutations in one place.


- Sebastian


On 2012-05-09 09:22:36, Avery Ching wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/5074/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-05-09 09:22:36)
bq.  
bq.  
bq.  Review request for giraph.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  * Implemented a request/response protocol with netty as a NettyClient and NettyServer.  There is a NettyClientWorker and NettyClientServer that implements WorkerClient and WorkerServer, respectively.  Netty is a lot faster since it's non-blocking and we can interleave computation and communication as opposed to Hadoop RPC (blocking).
bq.  * The netty server implementation uses concurrent hash maps to improved concurrency instead of synchronized blocks around maps.
bq.  * By default netty is used, but Hadoop RPC can be used with -Dgiraph.useNetty=false
bq.  * Changed the class hierarchy of ServerInterface to WorkerClientServer (WorkerClient and WorkerServer) to support a request/response protocol instead of just RPC
bq.  * In netty, the messages/mutations are gathered by partition and send out as a partition's worth of messages/mutations
bq.  * Added two new test classes (RequestTest.java and ConnectionTest.java) to test all requests and check netty connections.
bq.  * PageRankBenchmark uses EdgeListVertex as a default
bq.  
bq.  
bq.  This addresses bug GIRAPH-37.
bq.      https://issues.apache.org/jira/browse/GIRAPH-37
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClientServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClient.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMutationsCache.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMessageCache.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestRegistry.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestEncoder.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestDecoder.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1332888 
bq.    http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1332888 
bq.  
bq.  Diff: https://reviews.apache.org/r/5074/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  'mvn verify' passes.  I ran several test runs to gather performance results.  Here is a simple example:
bq.  
bq.  Hadoop RPC:
bq.  hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=false -w 5 -V 5000000 -s 5 -e 2 -v
bq.  
bq.  12/05/09 01:59:56 INFO mapred.JobClient:   Giraph Timers
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Total (milliseconds)=167722
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 3 (milliseconds)=24775
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Setup (milliseconds)=2930
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Shutdown (milliseconds)=181
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=51025
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 0 (milliseconds)=21543
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 4 (milliseconds)=19858
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 5 (milliseconds)=2844
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 2 (milliseconds)=24507
bq.  12/05/09 01:59:56 INFO mapred.JobClient:     Superstep 1 (milliseconds)=20054
bq.  
bq.  Netty:
bq.  hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -w 5 -V 5000000 -s 5 -e 2 -v
bq.  
bq.  12/05/09 02:06:10 INFO mapred.JobClient:   Giraph Timers
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Total (milliseconds)=57795
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 3 (milliseconds)=7636
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Setup (milliseconds)=3574
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Shutdown (milliseconds)=232
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=13393
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 0 (milliseconds)=5610
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 4 (milliseconds)=8473
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 5 (milliseconds)=1844
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 2 (milliseconds)=7418
bq.  12/05/09 02:06:10 INFO mapred.JobClient:     Superstep 1 (milliseconds)=9612
bq.  
bq.  These were some median runs. The overall runtime improved from 167722 -> 57795 with Netty (2.9x faster).  Loading the vertices improved from 51025 -bq.  13393 (3.8x faster).  More results coming tomorrow, but for bigger runs, the improvement is likely to be even more than 3x.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Avery
bq.  
bq.


                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed IPC

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271706#comment-13271706 ] 

Hudson commented on GIRAPH-37:
------------------------------

Integrated in Giraph-trunk-Commit #107 (See [https://builds.apache.org/job/Giraph-trunk-Commit/107/])
    GIRAPH-37. Implement Netty-backed IPC. (aching) (Revision 1336344)

     Result = FAILURE
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336344
Files : 
* /incubator/giraph/trunk/CHANGELOG
* /incubator/giraph/trunk/pom.xml
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyServer.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClient.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestDecoder.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestEncoder.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestRegistry.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RequestServerHandler.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ResponseClientHandler.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMessageCache.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendMutationsCache.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClient.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerClientServer.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WritableRequest.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/TestAutoCheckpoint.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/TestJsonBase64Format.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/TestManualCheckpoint.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/examples/SimpleShortestPathVertexTest.java
* /incubator/giraph/trunk/src/test/java/org/apache/giraph/utils/MockUtils.java

                
> Implement Netty-backed IPC
> --------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Avery Ching
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching updated GIRAPH-37:
------------------------------

    Attachment: GIRAPH-37.patch

Same as reviewboard file, but ensuring the license is granted here.
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271245#comment-13271245 ] 

Avery Ching commented on GIRAPH-37:
-----------------------------------

Thanks Claudio.

Here are more results with a scaled up 10 worker setup:

Hadoop RPC:
hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=false -w 10 -V 10000000 -s 5 -e 2 -v
12/05/09 02:32:05 INFO mapred.JobClient:   Giraph Timers
12/05/09 02:32:05 INFO mapred.JobClient:     Total (milliseconds)=149880
12/05/09 02:32:05 INFO mapred.JobClient:     Superstep 3 (milliseconds)=21575
12/05/09 02:32:05 INFO mapred.JobClient:     Setup (milliseconds)=7428
12/05/09 02:32:05 INFO mapred.JobClient:     Shutdown (milliseconds)=174
12/05/09 02:32:05 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=39558
12/05/09 02:32:05 INFO mapred.JobClient:     Superstep 0 (milliseconds)=16887
12/05/09 02:32:05 INFO mapred.JobClient:     Superstep 4 (milliseconds)=18613
12/05/09 02:32:05 INFO mapred.JobClient:     Superstep 5 (milliseconds)=3292
12/05/09 02:32:05 INFO mapred.JobClient:     Superstep 2 (milliseconds)=21313
12/05/09 02:32:05 INFO mapred.JobClient:     Superstep 1 (milliseconds)=21035

Netty:
hadoop jar ~/giraph-0.2-SNAPSHOT-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -w 10 -V 10000000 -s 5 -e 2 -v
12/05/09 02:35:06 INFO mapred.JobClient:   Giraph Timers
12/05/09 02:35:06 INFO mapred.JobClient:     Total (milliseconds)=59270
12/05/09 02:35:06 INFO mapred.JobClient:     Superstep 3 (milliseconds)=11827
12/05/09 02:35:06 INFO mapred.JobClient:     Setup (milliseconds)=3196
12/05/09 02:35:06 INFO mapred.JobClient:     Shutdown (milliseconds)=124
12/05/09 02:35:06 INFO mapred.JobClient:     Vertex input superstep (milliseconds)=13130
12/05/09 02:35:06 INFO mapred.JobClient:     Superstep 0 (milliseconds)=8564
12/05/09 02:35:06 INFO mapred.JobClient:     Superstep 4 (milliseconds)=5540
12/05/09 02:35:06 INFO mapred.JobClient:     Superstep 5 (milliseconds)=2012
12/05/09 02:35:06 INFO mapred.JobClient:     Superstep 2 (milliseconds)=8601
12/05/09 02:35:06 INFO mapred.JobClient:     Superstep 1 (milliseconds)=6271

These results are fairly similar to the first set (even though there are more workers).  I'm pretty sure we can squeeze more performance from Netty in the future in future patches (i.e. local send optimization is missing, tuning TCP parameters, exposing more knobs to the user, etc.).
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Dmitriy V. Ryaboy (Commented) (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128664#comment-13128664 ] 

Dmitriy V. Ryaboy commented on GIRAPH-37:
-----------------------------------------

Ok emailed Marius Eriksen (Finagle lead, among other things), and here's his feedback so far:

{quote}
that's great! (that they're doing this). would be happy to help in any way to make it work.

> 1) why a custom thrift compiler? makes distribution of code hard, have
> to make devs install that

this sucks, but it sadly necessary (unless we can get our work integrated with the standard thrift stack). we do require custom codegen in order to interface with the finagle thrift codec.

we now actually have our own entirely-in-JVM codegenerator, that parses thrift IDL, etc.-- so at the very least we'll have something portable that also shouldn't require any installation-- presumably the various build systems can download them as a build-only dependency, etc. we're using this internally for a few projects already, but still working out how to widely distribute it.

> 2) gigantic hard to understand stack traces

that's mostly a fact of life, sadly. i mean, with any asynchronous system you have much less context in your stack traces generally, but with proliferation of anonymous closures in the finagle codebase, it's often made even worse.

a few things here: (1) as of 1.9.3 (i notice this patch uses 1.9.0) stacks are now unwound per responder per thread. this means roughly the stacks you observe will ever only be one callback deep. now this might be even worse in terms of debugging, but it does produce cleaner/smaller stack traces.

debuggability is a big concern (both for finagle, and for general use of Futures). one interesting difference between asynchronous systems and synchronous ones is that stack traces don't tell the story, or may tell only part of the story. really what you want is a dispatch *graph*. we have a mechanism in twitter futures (called Locals-- they're like thread locals but instead they're local to the dispatch graph) where can record dispatches. this would now give us our graph. a little weird, maybe, but certainly something that would be very helpful in many circumstances. i'm still toying around with how to expose them (eg. we could synthesize stacks that's really a topological sort of the dispatch graph in all exceptions encoded by finagleā€¦)

> 3) some stability issues, apparently

i looked at his patch briefly.  this part is suspect (the fact that he throws in a callback).

{code}
+    @Override
+    public void onFailure(Throwable cause) {
+      cdl.countDown();
+      throw new RuntimeException("Hit exception in proxied call", cause);
+    }
{code}
and would cause that exception to be thrown. it's actually harmless in terms of functionality, but it will report the wrong underlying reason.

none of the user provided handlers should throw exceptions. at the same time, the fact that it's reported as "result set multiple times" may indicate a bug somewhere. i'm going to look into that probably by ~wed or so (my schedule is pretty filled up until then).

it's difficult to debug what's going on there (2/3s successful runs) without getting some stats out of the system, and/or diving deeper into the code. it sounds like perhaps the client isn't tuned properly for the particular use case.

anyhow. in my experience, almost *all* debugging of these sorts of systems can be done by looking at the client/server stats. and finagle exports a rich set of stats for both.

use the .reportTo() method in the builder to report to either ostrich or science/commons stats, or provide your own StatsReceiver.

{quote}
                
> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>         Attachments: GIRAPH-37-wip.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (GIRAPH-37) Implement Netty-backed IPC

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271696#comment-13271696 ] 

Avery Ching commented on GIRAPH-37:
-----------------------------------

@Claudio,

Vertex input superstep is a blocking operation when sending the vertices to the destination partition owners.  Now it's non-blocking, overlapping communication and computation.

Setup should be ignored.  That is the time to get all the map tasks and pick a master.
                
> Implement Netty-backed IPC
> --------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Avery Ching
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (GIRAPH-37) Implement Netty-backed IPC

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13271728#comment-13271728 ] 

Hudson commented on GIRAPH-37:
------------------------------

Integrated in Giraph-trunk-Commit #108 (See [https://builds.apache.org/job/Giraph-trunk-Commit/108/])
    GIRAPH-37. Compilation failures on type problem fixes (continued from
previous commit). (Revision 1336361)

     Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1336361
Files : 
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMessagesRequest.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/SendPartitionMutationsRequest.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Edge.java
* /incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexMutations.java

                
> Implement Netty-backed IPC
> --------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Avery Ching
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (GIRAPH-37) Implement Netty-backed IPC

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Avery Ching resolved GIRAPH-37.
-------------------------------

    Resolution: Fixed

Hudson is successful, closing.
                
> Implement Netty-backed IPC
> --------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Avery Ching
>         Attachments: GIRAPH-37-wip.patch, GIRAPH-37.patch
>
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by Owen O'Malley <ow...@hortonworks.com>.
On Sun, Sep 18, 2011 at 6:10 PM, Dmitriy Ryaboy <dm...@twitter.com> wrote:
> Note that finagle is not thrift specific. It's rpc protocol agnostic.
> We can make a finagle-hadooprpc connector. Granted, the thrift
> implementation is pretty hardened. Actually the fact that finagle is
> independent of rpc frework may be another reason to use it -- flip
> between hadooprpc and thrift depending on whether you want performance
> or security.

I think pulling the security out of Giraph would be a mistake. (Big
surprise, huh?)

On the other hand, the current mechanism is just a shared secret with
digest-md5, which isn't that hard to implement in other schemes.

-- Owen

Re: [jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by Dmitriy Ryaboy <dm...@twitter.com>.
Note that finagle is not thrift specific. It's rpc protocol agnostic.
We can make a finagle-hadooprpc connector. Granted, the thrift
implementation is pretty hardened. Actually the fact that finagle is
independent of rpc frework may be another reason to use it -- flip
between hadooprpc and thrift depending on whether you want performance
or security.

On Sep 18, 2011, at 8:08 AM, "Jakob Homan (JIRA)" <ji...@apache.org> wrote:

>
>    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107455#comment-13107455 ]
>
> Jakob Homan commented on GIRAPH-37:
> -----------------------------------
>
> I'll take a look at Finagle.  Security probably won't be in the first version, but as a veteran of the Hadoop Security Wars myself, I'll be sure it can be supported.
>
>> Implement Netty-backed rpc solution
>> -----------------------------------
>>
>>                Key: GIRAPH-37
>>                URL: https://issues.apache.org/jira/browse/GIRAPH-37
>>            Project: Giraph
>>         Issue Type: New Feature
>>           Reporter: Jakob Homan
>>           Assignee: Jakob Homan
>>
>> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.
>
> --
> This message is automatically generated by JIRA.
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
>

[jira] [Commented] (GIRAPH-37) Implement Netty-backed rpc solution

Posted by "Jakob Homan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13107455#comment-13107455 ] 

Jakob Homan commented on GIRAPH-37:
-----------------------------------

I'll take a look at Finagle.  Security probably won't be in the first version, but as a veteran of the Hadoop Security Wars myself, I'll be sure it can be supported.

> Implement Netty-backed rpc solution
> -----------------------------------
>
>                 Key: GIRAPH-37
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-37
>             Project: Giraph
>          Issue Type: New Feature
>            Reporter: Jakob Homan
>            Assignee: Jakob Homan
>
> GIRAPH-12 considered replacing the current Hadoop based rpc method with Netty, but didn't went in another direction. I think there is still value in this approach, and will also look at Finagle.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira