You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Alessandro Presta <al...@fb.com> on 2013/02/22 19:25:53 UTC

Review Request: Range-partitioning and edge locality benchmark

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9562/
-----------------------------------------------------------

Review request for giraph.


Description
-------

Range-based partitioning can drastically reduce network communication when using a vertex id space where close ids correspond to vertices likely to be connected.
We currently have an incomplete implementation of range-based partitioning that tries to be very generic (allowing arbitrarily different partition sizes).
Talking about this with Avery, we thought that for now it's better to add a simpler version (which tries to split as evenly as possible) and leave the current classes there in case someone wants to implement a more complex logic.
A nice-to-have is also extending the pseudo-random formats to generate a required ratio of partition-local edges, in order to estimate the impact of locality with benchmarks.


This addresses bug GIRAPH-535.
    https://issues.apache.org/jira/browse/GIRAPH-535


Diffs
-----

  giraph-core/src/main/java/org/apache/giraph/benchmark/AggregatorsBenchmark.java a82a9f88c7947980ae0c512981000edd494bb30f 
  giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 8341dce2d66e924c402c27b252db604b0b1f13be 
  giraph-core/src/main/java/org/apache/giraph/benchmark/RandomMessageBenchmark.java d0d80afde3fa46b521e146dcc2c3bc5036f84052 
  giraph-core/src/main/java/org/apache/giraph/benchmark/ShortestPathsBenchmark.java 52bbac4bc7f47efbb7b309d071cee6c6dd5364f6 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java e0aeba347564d25e39c27ed67cc1393b5ac62d99 
  giraph-core/src/main/java/org/apache/giraph/io/formats/PseudoRandomEdgeInputFormat.java 90f814ceb960b1bc570106c06c7155f12b3acae4 
  giraph-core/src/main/java/org/apache/giraph/io/formats/PseudoRandomInputFormatConstants.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/formats/PseudoRandomLocalEdgesHelper.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/io/formats/PseudoRandomVertexInputFormat.java f2a2c93890eb483367cb7cc37ca9b3ceeb5da2ac 
  giraph-core/src/main/java/org/apache/giraph/partition/HashMasterPartitioner.java a9611d9b21a264da2dc7fdcbbd6a955a6f85fee0 
  giraph-core/src/main/java/org/apache/giraph/partition/HashWorkerPartitioner.java bb6e38d64442bb65aeda70794905693b53126f80 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionBalancer.java bdbd467a352f27ac9cfacbd2933f0f8520ea60d9 
  giraph-core/src/main/java/org/apache/giraph/partition/PartitionUtils.java e472ac64f312df815862bf364015761bd469d370 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleIntRangePartitionerFactory.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleLongRangePartitionerFactory.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleRangeMasterPartitioner.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/partition/SimpleRangeWorkerPartitioner.java PRE-CREATION 
  giraph-core/src/test/java/org/apache/giraph/io/TestJsonBase64Format.java f4d97a4aad2b151c0dce77716c32edccb541c0fb 
  giraph-examples/src/test/java/org/apache/giraph/TestGraphPartitioner.java 2e12bdc3d8f8a63395b070bc7cfdb3e2474af8a1 
  giraph-examples/src/test/java/org/apache/giraph/TestPartitionContext.java cdf1f65bff36a4a198a94f7ac75ef2ade8e66eba 
  giraph-examples/src/test/java/org/apache/giraph/examples/TestPageRank.java 5e6159606c82a68bdc9db90139ff9ca32fdf56b1 

Diff: https://reviews.apache.org/r/9562/diff/


Testing
-------

1) mvn verify

2) Example runs of PageRankBenchmark with 1B vertices, 100B edges, 200 workers, 32 compute threads:
Range-partitioning, random edges (run with "-p 1")
—
Superstep 0 (milliseconds) 66,906
Superstep 1 (milliseconds) 72,256
Superstep 2 (milliseconds) 71,551
Range-partitioning, 75% local edges (run with "-Dpartition.vertexKeySpaceSize=1000000000 -p 1 -l 0.75")
—
Superstep 0 (milliseconds) 27,883
Superstep 1 (milliseconds) 34,693
Superstep 2 (milliseconds) 33,729


Thanks,

Alessandro Presta