You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Hudson (JIRA)" <ji...@apache.org> on 2012/10/17 06:40:03 UTC

[jira] [Commented] (GIRAPH-374) Multithreading in input split loading and compute

    [ https://issues.apache.org/jira/browse/GIRAPH-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477591#comment-13477591 ] 

Hudson commented on GIRAPH-374:
-------------------------------

Integrated in Giraph-trunk-Commit #244 (See [https://builds.apache.org/job/Giraph-trunk-Commit/244/])
    GIRAPH-374: Multithreading in input split loading and compute
(aching). (Revision 1399090)

     Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1399090
Files : 
* /giraph/trunk/CHANGELOG
* /giraph/trunk/giraph-formats-contrib/src/main/java/org/apache/giraph/io/hbase/HBaseVertexInputFormat.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedService.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceMaster.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMessageCache.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendPartitionCache.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerClient.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/WorkerServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/ChannelRotater.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClient.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/handler/AddressRequestIdGenerator.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/AggregatorWrapper.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceMaster.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/ComputeCallable.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/FinishedSuperstepStats.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GraphMapper.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GraphState.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/InputSplitsCallable.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/MutableVertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/SimpleMutableVertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Vertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/LoggerUtils.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ProgressableUtils.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/Time.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/BspCase.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/TestBspBasic.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/TestPageRank.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java

                
> Multithreading in input split loading and compute
> -------------------------------------------------
>
>                 Key: GIRAPH-374
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-374
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Avery Ching
>            Assignee: Avery Ching
>         Attachments: GIRAPH-374.2.patch
>
>
> Cleaned up the WorkerClient hierarchy
> - WorkerClientRequestProcessor is a request cache for every thread (input split loading / compute)
> - With RPC gone, got rid of ugly WorkerClientServer and NettyWorkerClientServer
> SendPartitionCache
> Made GraphState immutable for multi-threading
> Added multithreading for loading the input splits
> Added multithreading for compute
> Added thread-level debugging as an option
> Added additional testing on the number of vertices, edges
> Optimization on HashWorkerPartitioner to use CopyOnWriteArrayList instead of sychronized list (this is a bottleneck)
> Added multithreaded TestPageRank test case
> I ran the PageRankBenchmark on 20 workers with 10M vertices, 1B edges.  All supersteps are about the same time, so I just compared superstep 0 from every test.  Compute performance gains are quite nice (even a little faster than before with one thread).  Actual gains will depend heavily on the number of cores you have and possible parallelism of the application.
> {code}
> Trunk
> # threads  compute time (secs)   total time (secs)
> 1          89                    97.543
> Multithreading
> 1          86.70094              92.477
> 2          50.41521              57.850
> 4          38.07716              50.246
> 8          38.63188              45.940
> 16         22.999943             48.607
> 24         23.649189             45.112
> 32         21.412325             44.201
> {code}
> We also saw similar gains on the input split loading on an internal app. Future work can be to further improve the scalability of multithreading.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira