You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Avery Ching <av...@gmail.com> on 2011/11/09 12:18:16 UTC

Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/
-----------------------------------------------------------

Review request for giraph.


Summary
-------

Warning: This is a very large change!

Vertex ranges no longer exist.  A generic partitioner handles the
division of vertex ids to partitions.  As a default, there is a
HashPartitioner and a HashRangePartitioner that will use the hashCode
of a Java object to decide which partition to place the vertex.
Developers can write their own algorithm to determine how to change
the partitioning as well as implement the assignment of partitions to
workers.  All vertices loaded from the input split are sent to the
owner of the partition rather than loaded locally.  This eliminates the
constraint that the vertices must be ordered in the input split.

The checkpoint format has been changed to suit the new partition
style.  Checkpoints are now a lot simpler.  The master will assign
partitions and the workers will only load their own partitions from
the checkpoint.

Unfortunately, the vertex range implementation was baked into almost
every aspect of the code (hence the ridiculous size of this diff).
But now it should be flexible to support several different graph
partitioning schemes (i.e. hash-based, hash-ranged-based, and for
special cases, fully ranged-based).

Sorry for the long delay, but this way pretty involved.


This addresses bug GIRAPH-11.
    https://issues.apache.org/jira/browse/GIRAPH-11


Diffs
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1196639 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1196639 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java 1196639 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java 1196639 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java 1196639 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1198972 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1198972 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java 1196639 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/StaticBalancer.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangePartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeMasterPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeSplitHint.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/WritableUtils.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/site/site.xml 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1199643 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1186590 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1199643 

Diff: https://reviews.apache.org/r/2788/diff


Testing
-------

local and MR unittests.  Added some simple unittests for testing the out-of-order input splits and other balancing algorithms.


Thanks,

Avery


Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

Posted by Avery Ching <av...@gmail.com>.

> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > Overall I like it.  Please avoid non-essential format changes in large patches; when reviewing it's like trying to run a marathon with a pebble in your shoe.  There needs to be quite a bit of unit test coverage on the new classes.  Most of them should be amenable to straight-up unit tests rather than ZK-involved integration tests.

Since it's a straight switch from VertexRange objects to Partition objects, everything is tested by the same existing integration tests.  I have also added a few more integration tests that ensure reverse ordering, different algorithms, etc.  I agree more unittestting should be done rather than integration testing.  If you don't mind, I'd like to add those in a later issue as this one is already too big and needs to be committed soon.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java, line 111
> > <https://reviews.apache.org/r/2788/diff/2/?file=57771#file57771line111>
> >
> >     return type has changed. javadoc needs updated.

Fixed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java, line 62
> > <https://reviews.apache.org/r/2788/diff/2/?file=57781#file57781line62>
> >
> >     switch statement?

In this case, I prefer the if/else if/else logic due since I have to scope each block, making the switch a bit long with the extra cscoping i.e. case 3: { long...


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java, line 49
> > <https://reviews.apache.org/r/2788/diff/2/?file=57784#file57784line49>
> >
> >     This seems like a dangerous thing to leave lying around, even for example purposes.  Is there another example that we can generate which might be more useful?

I have moved it to the test (TestGraphPartitioner), so that people don't just use it.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java, line 123
> > <https://reviews.apache.org/r/2788/diff/2/?file=57785#file57785line123>
> >
> >     indent +4

Fixed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java, line 128
> > <https://reviews.apache.org/r/2788/diff/2/?file=57785#file57785line128>
> >
> >     need debugging guards here and +2 lines

Fixed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java, line 157
> > <https://reviews.apache.org/r/2788/diff/2/?file=57785#file57785line157>
> >
> >     log guard

Fixed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java, line 162
> > <https://reviews.apache.org/r/2788/diff/2/?file=57785#file57785line162>
> >
> >     ditto

Fixed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java, line 745
> > <https://reviews.apache.org/r/2788/diff/2/?file=57791#file57791line745>
> >
> >     typo. send -> sent

Fixed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java, line 1579
> > <https://reviews.apache.org/r/2788/diff/2/?file=57791#file57791line1579>
> >
> >     typo. sent -> send

Fixed.  This is sad, especially since English is my only language =).


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitioner.java, line 33
> > <https://reviews.apache.org/r/2788/diff/2/?file=57806#file57806line33>
> >
> >     Better to call it a factory?

Agreed, changed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java, line 76
> > <https://reviews.apache.org/r/2788/diff/2/?file=57811#file57811line76>
> >
> >     typo: dependant -> dependent

Changed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java, line 123
> > <https://reviews.apache.org/r/2788/diff/2/?file=57814#file57814line123>
> >
> >     rename: value -> totalValue, to be consistent with usage.

Changed.


> On 2011-11-14 20:54:28, Jakob Homan wrote:
> > http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java, line 117
> > <https://reviews.apache.org/r/2788/diff/2/?file=57824#file57824line117>
> >
> >     I'm unclear on this.

RangePartitionerFactory unfortunately is abstract, needs implementations of various index types.  A developer can use RangeWorkerPartitioner as something to help them out for their particular implementation.  This is somewhat experimental work, but the idea is that it will allow very very advanced users to customize partiitoning based on a range for their particular index type.  I am making this class abstract with a big notice on what needs to be done if you want to use it.


- Avery


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/#review3211
-----------------------------------------------------------


On 2011-11-14 22:24:27, Avery Ching wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2788/
> -----------------------------------------------------------
> 
> (Updated 2011-11-14 22:24:27)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> -------
> 
> Warning: This is a very large change!
> 
> Vertex ranges no longer exist.  A generic partitioner handles the
> division of vertex ids to partitions.  As a default, there is a
> HashPartitioner and a HashRangePartitioner that will use the hashCode
> of a Java object to decide which partition to place the vertex.
> Developers can write their own algorithm to determine how to change
> the partitioning as well as implement the assignment of partitions to
> workers.  All vertices loaded from the input split are sent to the
> owner of the partition rather than loaded locally.  This eliminates the
> constraint that the vertices must be ordered in the input split.
> 
> The checkpoint format has been changed to suit the new partition
> style.  Checkpoints are now a lot simpler.  The master will assign
> partitions and the workers will only load their own partitions from
> the checkpoint.
> 
> Unfortunately, the vertex range implementation was baked into almost
> every aspect of the code (hence the ridiculous size of this diff).
> But now it should be flexible to support several different graph
> partitioning schemes (i.e. hash-based, hash-ranged-based, and for
> special cases, fully ranged-based).
> 
> Sorry for the long delay, but this way pretty involved.
> 
> 
> This addresses bug GIRAPH-11.
>     https://issues.apache.org/jira/browse/GIRAPH-11
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashPartitionerFactory.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangePartitionerFactory.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitionerFactory.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/StaticBalancer.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexEdgeCount.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeMasterPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionOwner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionStats.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionerFactory.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeSplitHint.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/WritableUtils.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1201630 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1201630 
> 
> Diff: https://reviews.apache.org/r/2788/diff
> 
> 
> Testing
> -------
> 
> local and MR unittests.  Added some simple unittests for testing the out-of-order input splits and other balancing algorithms.
> 
> 
> Thanks,
> 
> Avery
> 
>


Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

Posted by Jakob Homan <jg...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/#review3211
-----------------------------------------------------------


Overall I like it.  Please avoid non-essential format changes in large patches; when reviewing it's like trying to run a marathon with a pebble in your shoe.  There needs to be quite a bit of unit test coverage on the new classes.  Most of them should be amenable to straight-up unit tests rather than ZK-involved integration tests.


http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java
<https://reviews.apache.org/r/2788/#comment7131>

    return type has changed. javadoc needs updated.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java
<https://reviews.apache.org/r/2788/#comment7152>

    switch statement?



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java
<https://reviews.apache.org/r/2788/#comment7156>

    This seems like a dangerous thing to leave lying around, even for example purposes.  Is there another example that we can generate which might be more useful?



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
<https://reviews.apache.org/r/2788/#comment7158>

    indent +4



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
<https://reviews.apache.org/r/2788/#comment7159>

    need debugging guards here and +2 lines



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
<https://reviews.apache.org/r/2788/#comment7161>

    log guard



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java
<https://reviews.apache.org/r/2788/#comment7162>

    ditto



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/2788/#comment7168>

    typo. send -> sent



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/2788/#comment7173>

    typo. sent -> send



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitioner.java
<https://reviews.apache.org/r/2788/#comment7178>

    Better to call it a factory?



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java
<https://reviews.apache.org/r/2788/#comment7180>

    typo: dependant -> dependent



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java
<https://reviews.apache.org/r/2788/#comment7182>

    rename: value -> totalValue, to be consistent with usage.



http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java
<https://reviews.apache.org/r/2788/#comment7185>

    I'm unclear on this.


- Jakob


On 2011-11-14 06:56:19, Avery Ching wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/2788/
> -----------------------------------------------------------
> 
> (Updated 2011-11-14 06:56:19)
> 
> 
> Review request for giraph.
> 
> 
> Summary
> -------
> 
> Warning: This is a very large change!
> 
> Vertex ranges no longer exist.  A generic partitioner handles the
> division of vertex ids to partitions.  As a default, there is a
> HashPartitioner and a HashRangePartitioner that will use the hashCode
> of a Java object to decide which partition to place the vertex.
> Developers can write their own algorithm to determine how to change
> the partitioning as well as implement the assignment of partitions to
> workers.  All vertices loaded from the input split are sent to the
> owner of the partition rather than loaded locally.  This eliminates the
> constraint that the vertices must be ordered in the input split.
> 
> The checkpoint format has been changed to suit the new partition
> style.  Checkpoints are now a lot simpler.  The master will assign
> partitions and the workers will only load their own partitions from
> the checkpoint.
> 
> Unfortunately, the vertex range implementation was baked into almost
> every aspect of the code (hence the ridiculous size of this diff).
> But now it should be flexible to support several different graph
> partitioning schemes (i.e. hash-based, hash-ranged-based, and for
> special cases, fully ranged-based).
> 
> Sorry for the long delay, but this way pretty involved.
> 
> 
> This addresses bug GIRAPH-11.
>     https://issues.apache.org/jira/browse/GIRAPH-11
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/StaticBalancer.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexEdgeCount.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangePartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeMasterPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionOwner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionStats.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeSplitHint.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/WritableUtils.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1201607 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1186590 
>   http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1201607 
> 
> Diff: https://reviews.apache.org/r/2788/diff
> 
> 
> Testing
> -------
> 
> local and MR unittests.  Added some simple unittests for testing the out-of-order input splits and other balancing algorithms.
> 
> 
> Thanks,
> 
> Avery
> 
>


Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/
-----------------------------------------------------------

(Updated 2011-11-14 22:24:27.293676)


Review request for giraph.


Changes
-------

Made changes based on Jakob's review.


Summary
-------

Warning: This is a very large change!

Vertex ranges no longer exist.  A generic partitioner handles the
division of vertex ids to partitions.  As a default, there is a
HashPartitioner and a HashRangePartitioner that will use the hashCode
of a Java object to decide which partition to place the vertex.
Developers can write their own algorithm to determine how to change
the partitioning as well as implement the assignment of partitions to
workers.  All vertices loaded from the input split are sent to the
owner of the partition rather than loaded locally.  This eliminates the
constraint that the vertices must be ordered in the input split.

The checkpoint format has been changed to suit the new partition
style.  Checkpoints are now a lot simpler.  The master will assign
partitions and the workers will only load their own partitions from
the checkpoint.

Unfortunately, the vertex range implementation was baked into almost
every aspect of the code (hence the ridiculous size of this diff).
But now it should be flexible to support several different graph
partitioning schemes (i.e. hash-based, hash-ranged-based, and for
special cases, fully ranged-based).

Sorry for the long delay, but this way pretty involved.


This addresses bug GIRAPH-11.
    https://issues.apache.org/jira/browse/GIRAPH-11


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashPartitionerFactory.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangePartitionerFactory.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitionerFactory.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/StaticBalancer.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexEdgeCount.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeMasterPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionerFactory.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeSplitHint.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/WritableUtils.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestGraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1201630 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1201630 

Diff: https://reviews.apache.org/r/2788/diff


Testing
-------

local and MR unittests.  Added some simple unittests for testing the out-of-order input splits and other balancing algorithms.


Thanks,

Avery


Re: Review Request: GIRAPH-11 : Improve the graph distribution of Giraph

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2788/
-----------------------------------------------------------

(Updated 2011-11-14 06:56:19.251685)


Review request for giraph.


Changes
-------

Updated the diff as per Hyunsik's request to build against recent trunk changes.  While I was waiting I added some fixed and additions as well.

Upgrade ZooKeeper to 3.3.3 from 3.3.1.

Fixed bug in PseudoRandomVertexInputFormat.java where the edges are not fully added (hasEdge is not the right place to look for the edge).

Fixed bug in BasicRPCCommunications when putting to a local inPartitionMap

Added counter for last checkpointed superstep

Master should refresh the progress every 60 seconds while waiting for workers to ensure that the job isn't killed

Fixed bugs in vertexCounter, finishedVertexCoutner, edgeCounter, and sentMessages counter not resetting every update (just cumultatively being added).

Add additional helpful status messages for debugging.

Turned off speculative execution for Giraph (not a good idea).

Added analysis of the partition balancing for debugging


Summary
-------

Warning: This is a very large change!

Vertex ranges no longer exist.  A generic partitioner handles the
division of vertex ids to partitions.  As a default, there is a
HashPartitioner and a HashRangePartitioner that will use the hashCode
of a Java object to decide which partition to place the vertex.
Developers can write their own algorithm to determine how to change
the partitioning as well as implement the assignment of partitions to
workers.  All vertices loaded from the input split are sent to the
owner of the partition rather than loaded locally.  This eliminates the
constraint that the vertices must be ordered in the input split.

The checkpoint format has been changed to suit the new partition
style.  Checkpoints are now a lot simpler.  The master will assign
partitions and the workers will only load their own partitions from
the checkpoint.

Unfortunately, the vertex range implementation was baked into almost
every aspect of the code (hence the ridiculous size of this diff).
But now it should be flexible to support several different graph
partitioning schemes (i.e. hash-based, hash-ranged-based, and for
special cases, fully ranged-based).

Sorry for the long delay, but this way pretty involved.


This addresses bug GIRAPH-11.
    https://issues.apache.org/jira/browse/GIRAPH-11


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/incubator/giraph/trunk/pom.xml 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/benchmark/PseudoRandomVertexInputFormat.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/CommunicationsInterface.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/RPCCommunications.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerInterface.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerCommunications.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexInputFormat.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/GeneratedVertexReader.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MaxAggregator.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/MinAggregator.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleMutateGraphVertex.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SimpleSuperstepVertex.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepBalancer.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/SuperstepHashPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/examples/VerifyMessage.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/AutoBalancer.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertex.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BasicVertexRangeBalancer.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspService.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceMaster.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/BspUtils.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GlobalStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphState.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/LongDoubleFloatDoubleVertex.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/MutableVertex.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/StaticBalancer.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/Vertex.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexEdgeCount.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRange.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/VertexRangeBalancer.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/WorkerInfo.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/BasicPartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/GraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashMasterPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangePartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashRangeWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/MasterGraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionBalancer.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionExchange.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionUtils.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeMasterPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionOwner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitionStats.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangePartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeSplitHint.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/RangeWorkerPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/utils/WritableUtils.java PRE-CREATION 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/main/java/org/apache/giraph/zk/ZooKeeperExt.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1201607 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1186590 
  http://svn.apache.org/repos/asf/incubator/giraph/trunk/src/test/java/org/apache/giraph/TestVertexRangeBalancer.java 1201607 

Diff: https://reviews.apache.org/r/2788/diff


Testing
-------

local and MR unittests.  Added some simple unittests for testing the out-of-order input splits and other balancing algorithms.


Thanks,

Avery