You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Alessandro Presta <al...@fb.com> on 2012/08/15 18:49:51 UTC

Re: Review Request: Move part of the graph out-of-core when memory is low

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 15, 2012, 4:49 p.m.)


Review request for giraph and Avery Ching.


Description
-------

I replaced the HashMap that stores partitions in a worker with a WorkerPartitionMap.
A WorkerPartitionMap has a normal in-memory map, and the ability to store entire partitions to the local FS when memory is low.

In order to provide the normal views of the contents of a map, we operate lazily by loading the out-of-core partitions as we iterate.
We always add the requested partition to the in-memory map (moving another one to disk to make room) in order to allow modification.

The option "giraph.outOfCoreGraph" controls whether we use WorkerPartitionMap or a normal HashMap as before.
"giraph.minFreeMemoryRatio" controls how much free memory we want to preserve out of the maximum available memory for the program.
If out-of-core is enabled and the memory limit is exceeded, we start spilling partitions to disk.

My main concern now is with input splitting: we may have to reorganize that phase so that messages are not all accumulated in temporary in-memory storage before they are transferred to the (disk-backed) WorkerPartitionMap.
See http://bit.ly/NwwFGI and subsequent comments for more context.


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

- Unit test.
- "mvn verify" with different settings in order to trigger the out-of-core mechanism.
- Plan on doing some benchmarking.


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.

> On Aug. 21, 2012, 4:31 p.m., Maja Kabiljo wrote:
> > Nice work, Alessandro.
> > 
> > Have you checked from which places in the code getPartition is called? For example, from the quick look, one of the things which worries me is in NettyWorkerServer.prepareSuperstep, line 118. service.getVertex() ends up calling it, and the order in which vertices are visited here is not sorted by partition. So you could possibly end up reading and writing the same partition to disk many times. Did you maybe do some testing to see how many times are the partitions actually swapped to/from disk during a superstep? 
> > (the one which I mentioned could be fixed by adding hasVertexId() to PartitionStore, since we just check if the vertex exists at that place)

I've analyzed the calls to getPartition(s), and besides the compute loop, we have the one you mentioned and the loop that initializes partition stats at the end of input superstep.

For the former, the only way is to add a set of vertex ids to PartitionStore, but this makes it less scalable, as then we have to assume that we can store all the vertex ids; on the other hand, MessageStore makes this very same assumption. Another problem with keeping track of vertex ids is that we have some code that calls Partition#putVertex() directly. Also, addPartition() would then have to read all vertex ids even when the partition is in memory, which would make it way slower in the standard use case.

For the latter, we already keep track of the number of vertices for out-of-core partitions, but doing the same for edges presents simular issues, plus it doesn't play well with mutations (which is why Partition#getEdgeCount() currently computes the sum on the fly).

In other words, yes, there are other potential savings, but not easy as they may seem.
As I've said before, I think we need to discuss substantial design changes before we can take these approaches to their full extent.

Anyway, for the record, here's the log of a full superstep for a worker in a PageRank job with 10 workers, 5 partitions in memory out of 10. The graph is a clique so there are always messages for each partition:
http://pastebin.com/raw.php?i=igYsAJGT
I can see 6 read/write operations by DiskBackedPartitionStore, so it doesn't look that bad.


> On Aug. 21, 2012, 4:31 p.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java, lines 105-107
> > <https://reviews.apache.org/r/5987/diff/5/?file=143400#file143400line105>
> >
> >     This is not used anymore.

Oh, thanks!


> On Aug. 21, 2012, 4:31 p.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java, line 54
> > <https://reviews.apache.org/r/5987/diff/5/?file=143406#file143406line54>
> >
> >     Typo.

Whoops, fixed!


> On Aug. 21, 2012, 4:31 p.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java, line 116
> > <https://reviews.apache.org/r/5987/diff/5/?file=143403#file143403line116>
> >
> >     I think you want to do this before you put it in the map. Like this someone can take it and lock it before the thread creating it does.
> >     Could you also say which assumptions are you making for concurrency in this code so it would be easier to review (i.e. which operations can happen in parallel and which can't)?

Well spotted, many thanks!

The assumption is that addPartition() and addPartitionVertices() are called in a multi-threaded context (processing incoming vertex requests) whereas getPartition(), removePartition() and similar are called in a sequential context (e.g. iterating through partitions to call compute).


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10579
-----------------------------------------------------------


On Aug. 21, 2012, 11:16 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:16 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10579
-----------------------------------------------------------


Nice work, Alessandro.

Have you checked from which places in the code getPartition is called? For example, from the quick look, one of the things which worries me is in NettyWorkerServer.prepareSuperstep, line 118. service.getVertex() ends up calling it, and the order in which vertices are visited here is not sorted by partition. So you could possibly end up reading and writing the same partition to disk many times. Did you maybe do some testing to see how many times are the partitions actually swapped to/from disk during a superstep? 
(the one which I mentioned could be fixed by adding hasVertexId() to PartitionStore, since we just check if the vertex exists at that place)


http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/5987/#comment22646>

    This is not used anymore.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
<https://reviews.apache.org/r/5987/#comment22648>

    I think you want to do this before you put it in the map. Like this someone can take it and lock it before the thread creating it does.
    Could you also say which assumptions are you making for concurrency in this code so it would be easier to review (i.e. which operations can happen in parallel and which can't)?



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
<https://reviews.apache.org/r/5987/#comment22647>

    Typo.


- Maja Kabiljo


On Aug. 21, 2012, 11:16 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:16 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10621
-----------------------------------------------------------


Looks good to me now. Again, great work!

- Maja Kabiljo


On Aug. 22, 2012, 1 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2012, 1 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 22, 2012, 1 p.m.)


Review request for giraph and Avery Ching.


Changes
-------

Replaced one last usage of MapMaker with Maps.newConcurrentMap.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 22, 2012, 12:36 p.m.)


Review request for giraph and Avery Ching.


Changes
-------

Iterate over messages by partition in prepareSuperstep().


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 22, 2012, 10:34 a.m.)


Review request for giraph and Avery Ching.


Changes
-------

This fixes the potential problem with maxPartitionsInMemory.
Will have a look into keeping vertex ids and stats in PartitionStore before finalizing.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.

> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you only had 10 vertices here)? If vertices are visited in random order in the prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the partition is in memory, which would make it way slower in the standard use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
>     Yeah, 10 was an unfortunate choice (more partitions than vertices!), I guess last night I was really too tired :P
>     Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 edges/vertex and got the same pattern):
>     http://pastebin.com/raw.php?i=jGBzaZA8
>     So we're loading each out-of-core partition twice. I get this same result with different numbers of in-memory partitions. I added some logging and it looks like MessageStore#getDestinationVertices() is returning vertices grouped by partition. Do you have any idea why? I wonder if it's because of hashing (messages are stored in a hash-map indexed by vertex id, and partitions are formed by hashing vertex ids).
>     An adversarial configuration could make us load partitions back and forth in a random fashion.
>     
>     Regarding addPartition, I mean that whenever we add a partition in memory, we currently simply move the reference (fast), whereas if we need to keep track of vertex ids we would have to copy them all in a global map. Anyway hold on, I'll see if I can do something about this. I'm mainly concerned with code calling Partition#putVertex() directly though, I see no way to disallow it.
> 
> Maja Kabiljo wrote:
>     Oh right, it's because SimpleMessageStore keeps messages grouped by partition, and then when you call getDestinationVertices it just appends all of them. If you try using out-of-core messaging, you should see much different results.
>     
>     I see now what you are saying for addPartition, but it doesn't look like a big deal to me.
> 
> Alessandro Presta wrote:
>     Having to keep all the vertex ids in memory as Writables seems a big overhead to me (both memory, and time to keep it updated for every single operation). I think it would be better to choose a wise access pattern instead.
>     I see that MessageStoreByPartition has a getPartitionDestinationVertices(int partitionId). Can't we make that a top-level requirement so that we can do:
>     
>     for each partition {
>        for each destination vertex in partition {
>            resolve vertex
>        }
>     }
> 
> Maja Kabiljo wrote:
>     Right, we can do that - ServerData.getCurrentMessageStore() returns MessageStoreByPartition. Looking at the code again, I remembered that the outer out-of-core message store also appends vertices of its inner partition message stores, so it should be O(p) reads there also.
>     
>     Still, for most of applications I think it would be smaller overhead to keep vertex ids in memory than to have to have twice as many partition reads from disk. But I agree that this could be avoided with some code redesign, maybe doing vertex resolutions and computations for partition together. Anyway, since this can be only 2x and not much more as I first thought, and with in-core there is no overhead, I'm ok with it now. Just something to think about in the future.

Exactly what I was thinking: first of all let's make sure the access pattern is optimal. Then we can try packing different phases together so that we load each partition once, do vertex resolutions and compute, move on to the next partition.
Note that we should do the same for mutations, but we should first treat mutation requests like messages and group them by partition (we can also argue that a mutation-intensive algorithm would need the same out-of-core functionality that we have for messages).

Will submit an updated patch shortly.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
-----------------------------------------------------------


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.

> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you only had 10 vertices here)? If vertices are visited in random order in the prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the partition is in memory, which would make it way slower in the standard use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
>     Yeah, 10 was an unfortunate choice (more partitions than vertices!), I guess last night I was really too tired :P
>     Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 edges/vertex and got the same pattern):
>     http://pastebin.com/raw.php?i=jGBzaZA8
>     So we're loading each out-of-core partition twice. I get this same result with different numbers of in-memory partitions. I added some logging and it looks like MessageStore#getDestinationVertices() is returning vertices grouped by partition. Do you have any idea why? I wonder if it's because of hashing (messages are stored in a hash-map indexed by vertex id, and partitions are formed by hashing vertex ids).
>     An adversarial configuration could make us load partitions back and forth in a random fashion.
>     
>     Regarding addPartition, I mean that whenever we add a partition in memory, we currently simply move the reference (fast), whereas if we need to keep track of vertex ids we would have to copy them all in a global map. Anyway hold on, I'll see if I can do something about this. I'm mainly concerned with code calling Partition#putVertex() directly though, I see no way to disallow it.
> 
> Maja Kabiljo wrote:
>     Oh right, it's because SimpleMessageStore keeps messages grouped by partition, and then when you call getDestinationVertices it just appends all of them. If you try using out-of-core messaging, you should see much different results.
>     
>     I see now what you are saying for addPartition, but it doesn't look like a big deal to me.

Having to keep all the vertex ids in memory as Writables seems a big overhead to me (both memory, and time to keep it updated for every single operation). I think it would be better to choose a wise access pattern instead.
I see that MessageStoreByPartition has a getPartitionDestinationVertices(int partitionId). Can't we make that a top-level requirement so that we can do:

for each partition {
   for each destination vertex in partition {
       resolve vertex
   }
}


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
-----------------------------------------------------------


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.

> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you only had 10 vertices here)? If vertices are visited in random order in the prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the partition is in memory, which would make it way slower in the standard use case.
> > What do you mean by this?

Yeah, 10 was an unfortunate choice (more partitions than vertices!), I guess last night I was really too tired :P
Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 edges/vertex and got the same pattern):
http://pastebin.com/raw.php?i=jGBzaZA8
So we're loading each out-of-core partition twice. I get this same result with different numbers of in-memory partitions. I added some logging and it looks like MessageStore#getDestinationVertices() is returning vertices grouped by partition. Do you have any idea why? I wonder if it's because of hashing (messages are stored in a hash-map indexed by vertex id, and partitions are formed by hashing vertex ids).
An adversarial configuration could make us load partitions back and forth in a random fashion.

Regarding addPartition, I mean that whenever we add a partition in memory, we currently simply move the reference (fast), whereas if we need to keep track of vertex ids we would have to copy them all in a global map. Anyway hold on, I'll see if I can do something about this. I'm mainly concerned with code calling Partition#putVertex() directly though, I see no way to disallow it.


> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java, lines 235-236
> > <https://reviews.apache.org/r/5987/diff/8/?file=143644#file143644line235>
> >
> >     You could end up with more than maxInMemoryPartitions.

I see, we could have two threads concurrently write to inMemoryPartitions. Fixed by synchronizing on inMemoryPartitions and returning early upon successful put().


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
-----------------------------------------------------------


On Aug. 21, 2012, 11:02 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:02 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Maja Kabiljo <ma...@fb.com>.

> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you only had 10 vertices here)? If vertices are visited in random order in the prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the partition is in memory, which would make it way slower in the standard use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
>     Yeah, 10 was an unfortunate choice (more partitions than vertices!), I guess last night I was really too tired :P
>     Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 edges/vertex and got the same pattern):
>     http://pastebin.com/raw.php?i=jGBzaZA8
>     So we're loading each out-of-core partition twice. I get this same result with different numbers of in-memory partitions. I added some logging and it looks like MessageStore#getDestinationVertices() is returning vertices grouped by partition. Do you have any idea why? I wonder if it's because of hashing (messages are stored in a hash-map indexed by vertex id, and partitions are formed by hashing vertex ids).
>     An adversarial configuration could make us load partitions back and forth in a random fashion.
>     
>     Regarding addPartition, I mean that whenever we add a partition in memory, we currently simply move the reference (fast), whereas if we need to keep track of vertex ids we would have to copy them all in a global map. Anyway hold on, I'll see if I can do something about this. I'm mainly concerned with code calling Partition#putVertex() directly though, I see no way to disallow it.
> 
> Maja Kabiljo wrote:
>     Oh right, it's because SimpleMessageStore keeps messages grouped by partition, and then when you call getDestinationVertices it just appends all of them. If you try using out-of-core messaging, you should see much different results.
>     
>     I see now what you are saying for addPartition, but it doesn't look like a big deal to me.
> 
> Alessandro Presta wrote:
>     Having to keep all the vertex ids in memory as Writables seems a big overhead to me (both memory, and time to keep it updated for every single operation). I think it would be better to choose a wise access pattern instead.
>     I see that MessageStoreByPartition has a getPartitionDestinationVertices(int partitionId). Can't we make that a top-level requirement so that we can do:
>     
>     for each partition {
>        for each destination vertex in partition {
>            resolve vertex
>        }
>     }

Right, we can do that - ServerData.getCurrentMessageStore() returns MessageStoreByPartition. Looking at the code again, I remembered that the outer out-of-core message store also appends vertices of its inner partition message stores, so it should be O(p) reads there also.

Still, for most of applications I think it would be smaller overhead to keep vertex ids in memory than to have to have twice as many partition reads from disk. But I agree that this could be avoided with some code redesign, maybe doing vertex resolutions and computations for partition together. Anyway, since this can be only 2x and not much more as I first thought, and with in-core there is no overhead, I'm ok with it now. Just something to think about in the future.


- Maja


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
-----------------------------------------------------------


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Maja Kabiljo <ma...@fb.com>.

> On Aug. 22, 2012, 8:55 a.m., Maja Kabiljo wrote:
> > Alessandro, can you please try the same test with a lot of vertices (you only had 10 vertices here)? If vertices are visited in random order in the prepareSuperstep, I think you should have much more disk operations.
> > 
> > > Also, addPartition() would then have to read all vertex ids even when the partition is in memory, which would make it way slower in the standard use case.
> > What do you mean by this?
> 
> Alessandro Presta wrote:
>     Yeah, 10 was an unfortunate choice (more partitions than vertices!), I guess last night I was really too tired :P
>     Here's what I see with 1000 vertices, 999 edges/vertex (I also tried 10 edges/vertex and got the same pattern):
>     http://pastebin.com/raw.php?i=jGBzaZA8
>     So we're loading each out-of-core partition twice. I get this same result with different numbers of in-memory partitions. I added some logging and it looks like MessageStore#getDestinationVertices() is returning vertices grouped by partition. Do you have any idea why? I wonder if it's because of hashing (messages are stored in a hash-map indexed by vertex id, and partitions are formed by hashing vertex ids).
>     An adversarial configuration could make us load partitions back and forth in a random fashion.
>     
>     Regarding addPartition, I mean that whenever we add a partition in memory, we currently simply move the reference (fast), whereas if we need to keep track of vertex ids we would have to copy them all in a global map. Anyway hold on, I'll see if I can do something about this. I'm mainly concerned with code calling Partition#putVertex() directly though, I see no way to disallow it.

Oh right, it's because SimpleMessageStore keeps messages grouped by partition, and then when you call getDestinationVertices it just appends all of them. If you try using out-of-core messaging, you should see much different results.

I see now what you are saying for addPartition, but it doesn't look like a big deal to me.


- Maja


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
-----------------------------------------------------------


On Aug. 22, 2012, 10:34 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 22, 2012, 10:34 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375843 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10614
-----------------------------------------------------------


Alessandro, can you please try the same test with a lot of vertices (you only had 10 vertices here)? If vertices are visited in random order in the prepareSuperstep, I think you should have much more disk operations.

> Also, addPartition() would then have to read all vertex ids even when the partition is in memory, which would make it way slower in the standard use case.
What do you mean by this?


http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
<https://reviews.apache.org/r/5987/#comment22714>

    You could end up with more than maxInMemoryPartitions.


- Maja Kabiljo


On Aug. 21, 2012, 11:02 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 11:02 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 21, 2012, 11:02 p.m.)


Review request for giraph and Avery Ching.


Changes
-------

Removed another debug message :(


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 21, 2012, 10:59 p.m.)


Review request for giraph and Avery Ching.


Changes
-------

Oh my, better go to sleep.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10596
-----------------------------------------------------------


Looks like we picked up a few changes from testing?  After this, I'm good.  Maja?


http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
<https://reviews.apache.org/r/5987/#comment22681>

    Need to change back to false here. =)



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
<https://reviews.apache.org/r/5987/#comment22682>

    Need to change back to false here. =)



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java
<https://reviews.apache.org/r/5987/#comment22683>

    Assuming you want to change this back to 10?  Or did you want 2.


- Avery Ching


On Aug. 21, 2012, 10:43 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 21, 2012, 10:43 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 21, 2012, 10:43 p.m.)


Review request for giraph and Avery Ching.


Changes
-------

Addressed Maja's comments.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 21, 2012, 11:16 a.m.)


Review request for giraph and Avery Ching.


Changes
-------

Addressed Avery's comments.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1375453 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.

> On Aug. 21, 2012, 7:21 a.m., Avery Ching wrote:
> > This is a really great way to add a terrific features while improving the codebase. Looks very nice and got rid of a lot of bad stuff.
> > 
> > There is a compile issue though:
> > 
> > [ERROR] COMPILATION ERROR : 
> > [INFO] -------------------------------------------------------------
> > [ERROR] /Users/aching/git/git_svn_giraph_trunk/target/munged/test/org/apache/giraph/TestMutateGraphVertex.java:[34,7] class TestMutateGraph is public, should be declared in a file named TestMutateGraph.java
> > 
> > Does this happen for you?
> > 
> > Would you mind trying out the distributed tests and run a few benchmarks on a real cluster just to verify?

Sorry for that, it's the usual problem with renaming files. I could never get post-review to work with my checkout. Maja and I were thinking it would be helpful to have some guidelines on the website on how to create patches and (for committers) how to apply them (I use "patch -p0 -i" currently).
For now, can you please manually rename TestMutateGraphVertex.java to TestMutateGraph.java after applying the patch? That should fix the build.

Other than that, this passes mvn verify and pseudo-distributed mode tests with 4 tasks, both with RPC and Netty, with/without out-of-core enabled and with 1 or 2 max partitions in memory.
You can find some benchmarks here (https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280) and in the following post.


> On Aug. 21, 2012, 7:21 a.m., Avery Ching wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java, lines 1494-1497
> > <https://reviews.apache.org/r/5987/diff/4/?file=142956#file142956line1494>
> >
> >     This is all handled by the send partition request I'm guessing?

Yes, since doRequest directly adds to PartitionStore.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10570
-----------------------------------------------------------


On Aug. 20, 2012, 2:04 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2012, 2:04 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Eli Reisman <in...@gmail.com>.
if you need a cluster run let me know I'd be happy to take another whack at
it too. Our clusters our clearing up nicely...!

On Tue, Aug 21, 2012 at 12:21 AM, Avery Ching <av...@gmail.com> wrote:

>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/#review10570
> -----------------------------------------------------------
>
>
> This is a really great way to add a terrific features while improving the
> codebase. Looks very nice and got rid of a lot of bad stuff.
>
> There is a compile issue though:
>
> [ERROR] COMPILATION ERROR :
> [INFO] -------------------------------------------------------------
> [ERROR]
> /Users/aching/git/git_svn_giraph_trunk/target/munged/test/org/apache/giraph/TestMutateGraphVertex.java:[34,7]
> class TestMutateGraph is public, should be declared in a file named
> TestMutateGraph.java
>
> Does this happen for you?
>
> Would you mind trying out the distributed tests and run a few benchmarks
> on a real cluster just to verify?
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
> <https://reviews.apache.org/r/5987/#comment22617>
>
>     This is way cleaner.  Definitely pleased to see this.
>
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
> <https://reviews.apache.org/r/5987/#comment22618>
>
>     Thanks for getting rid of this, much cleaner!
>
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
> <https://reviews.apache.org/r/5987/#comment22619>
>
>     Can you reformat this comment?
>
>     /**
>      * Partition...
>      */
>
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
> <https://reviews.apache.org/r/5987/#comment22620>
>
>     This can be final right?
>
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
> <https://reviews.apache.org/r/5987/#comment22621>
>
>     This is all handled by the send partition request I'm guessing?
>
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
> <https://reviews.apache.org/r/5987/#comment22623>
>
>     MapMaker from guava is a bit more flexible perhaps.
>
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
> <https://reviews.apache.org/r/5987/#comment22622>
>
>     You might want to use MapMaker(), from Guava here.
>
>
>
>
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
> <https://reviews.apache.org/r/5987/#comment22624>
>
>     indent?
>
>
> - Avery Ching
>
>
> On Aug. 20, 2012, 2:04 p.m., Alessandro Presta wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/5987/
> > -----------------------------------------------------------
> >
> > (Updated Aug. 20, 2012, 2:04 p.m.)
> >
> >
> > Review request for giraph and Avery Ching.
> >
> >
> > Description
> > -------
> >
> > I gave this another shot. This time it plays nicely with input
> superstep: I replaced both the temporary partitions and the worker
> partition map with a common data structure, PartitionStore, held in
> ServerData. This is similar to what we now have for messages.
> >
> > Partition is now thread-safe so that we can have concurrent calls to
> putVertex().
> >
> > SimplePartitionStore is backed by a concurrent hash map (nothing new
> here, except that we skip copying partitions to the worker).
> >
> > DiskBackedPartitionStore can hold up to a user-defined number of
> partitions in memory, and spill the remaining ones to disk. Each partition
> is stored in a separate file.
> > Adding vertices to an out-of-core partition consists in appending them
> to the file, which makes processing vertex requests relatively fast.
> > We use a ReadWriteLock for each partition: performing operations on a
> partition held in memory only requires a read-lock (since Partition is
> thread-safe), while creating a new partition, moving it in and out of core
> or appending vertices requires a write-lock (we can't have concurrent
> writes).
> >
> > Also note that this breaks Hadoop RPC: I preferred to keep it clean
> (this also shows what code we get rid of) instead of working around it. I
> suppose the Netty security patch will be completed soon. If not, I will
> restore RPC compatibility.
> >
> > More here:
> https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> >
> >
> > This addresses bug GIRAPH-249.
> >     https://issues.apache.org/jira/browse/GIRAPH-249
> >
> >
> > Diffs
> > -----
> >
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.javaPRE-CREATION
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.javaPRE-CREATION
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.javaPRE-CREATION
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java1374990
> >
> http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.javaPRE-CREATION
> >
> > Diff: https://reviews.apache.org/r/5987/diff/
> >
> >
> > Testing
> > -------
> >
> > mvn verify and pseudo-distributed mode tests with both
> SimplePartitionStore and DiskBackedPartitionStore
> >
> >
> > Thanks,
> >
> > Alessandro Presta
> >
> >
>
>

Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10570
-----------------------------------------------------------


This is a really great way to add a terrific features while improving the codebase. Looks very nice and got rid of a lot of bad stuff.

There is a compile issue though:

[ERROR] COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
[ERROR] /Users/aching/git/git_svn_giraph_trunk/target/munged/test/org/apache/giraph/TestMutateGraphVertex.java:[34,7] class TestMutateGraph is public, should be declared in a file named TestMutateGraph.java

Does this happen for you?

Would you mind trying out the distributed tests and run a few benchmarks on a real cluster just to verify?


http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
<https://reviews.apache.org/r/5987/#comment22617>

    This is way cleaner.  Definitely pleased to see this.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java
<https://reviews.apache.org/r/5987/#comment22618>

    Thanks for getting rid of this, much cleaner!



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/5987/#comment22619>

    Can you reformat this comment?
    
    /**
     * Partition...
     */



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/5987/#comment22620>

    This can be final right?



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java
<https://reviews.apache.org/r/5987/#comment22621>

    This is all handled by the send partition request I'm guessing?



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java
<https://reviews.apache.org/r/5987/#comment22623>

    MapMaker from guava is a bit more flexible perhaps.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java
<https://reviews.apache.org/r/5987/#comment22622>

    You might want to use MapMaker(), from Guava here.



http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java
<https://reviews.apache.org/r/5987/#comment22624>

    indent?


- Avery Ching


On Aug. 20, 2012, 2:04 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 20, 2012, 2:04 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1374990 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 20, 2012, 2:04 p.m.)


Review request for giraph and Avery Ching.


Changes
-------

Restored Hadoop RPC.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1374990 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 17, 2012, 11:12 a.m.)


Review request for giraph and Avery Ching.


Changes
-------

Switched to a more predictable load pattern, where a fixed subset of partitions is always in memory, and we keep a slot for loading the out-of-core partitions one after the other.
This also simplifies concurrency in addPartition()/addPartitionVertices(), because we don't need locks to operate on in-memory partitions, and thus we can use normal reentrant locks for out-of-core partitions.
Performance is basically the same on all previous benchmarks, with a slight improvement in SSSP.

I also expanded the unit test and added a convenient Partition#putVertices().

Maja: yours is a good idea, I bet we can add a PartitionStore for immutable graphs which keeps vertex ids/values in one file (continuosly updated) and edges in another one, written only once.
Will also take a look at checkpointing.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1374194 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.

> On Aug. 16, 2012, 12:42 p.m., Maja Kabiljo wrote:
> > I'll take a deeper look into this once you have a final version, for now one comment: When we do checkpointing, there is no need to deserialize all the partitions and then write them again to checkpoint file, instead you could just copy the bytes from out-of-core files (I did something similar with message stores if you want to take a look).
> > 
> > Another thing, would it be possible to make some kind of optimization for non-mutating graphs - to keep the edges on disk all the time and just rewrite vertex values?

Having thought about it more, I don't think it's a good idea to separate vertex storage from edge storage within the current API: the user can redefine readFields/write to do something non-standard. Either we disallow that (making the Writable methods of Vertex final) or we can't make any assumption when storing the graph out of core. With messages it's different because we have taken those away from user code.

Regarding the other idea on checkpointing, that's a bit messy too because out-of-core partitions don't necessarily have the same representation as normally serialized partitions. I'd be for evaluating the benefits before introducing even more clutter.

By the way, for this iteration I'm trying not to change the standard behavior a lot, much in the same vein as out-of-core messages.
For the future, I'm fine with disrupting even the basic building blocks (think Vertex, Partition) but we need to have some sort of new design in mind, otherwise we end up with an inconsistent codebase and lots of hacks to glue things together.
A discussion about this in the dev list is just getting started.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10400
-----------------------------------------------------------


On Aug. 17, 2012, 11:12 a.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 17, 2012, 11:12 a.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1374194 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/#review10400
-----------------------------------------------------------


I'll take a deeper look into this once you have a final version, for now one comment: When we do checkpointing, there is no need to deserialize all the partitions and then write them again to checkpoint file, instead you could just copy the bytes from out-of-core files (I did something similar with message stores if you want to take a look).

Another thing, would it be possible to make some kind of optimization for non-mutating graphs - to keep the edges on disk all the time and just rewrite vertex values?

- Maja Kabiljo


On Aug. 15, 2012, 4:51 p.m., Alessandro Presta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/5987/
> -----------------------------------------------------------
> 
> (Updated Aug. 15, 2012, 4:51 p.m.)
> 
> 
> Review request for giraph and Avery Ching.
> 
> 
> Description
> -------
> 
> I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.
> 
> Partition is now thread-safe so that we can have concurrent calls to putVertex().
> 
> SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).
> 
> DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
> Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
> We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).
> 
> Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.
> 
> More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280
> 
> 
> This addresses bug GIRAPH-249.
>     https://issues.apache.org/jira/browse/GIRAPH-249
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1373331 
>   http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/5987/diff/
> 
> 
> Testing
> -------
> 
> mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore
> 
> 
> Thanks,
> 
> Alessandro Presta
> 
>


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 15, 2012, 4:51 p.m.)


Review request for giraph and Avery Ching.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 15, 2012, 4:51 p.m.)


Review request for giraph and Avery Ching.


Description
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta


Re: Review Request: Move part of the graph out-of-core when memory is low

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/5987/
-----------------------------------------------------------

(Updated Aug. 15, 2012, 4:51 p.m.)


Review request for giraph and Avery Ching.


Description (updated)
-------

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

More here: https://issues.apache.org/jira/browse/GIRAPH-249?focusedCommentId=13435280&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13435280


This addresses bug GIRAPH-249.
    https://issues.apache.org/jira/browse/GIRAPH-249


Diffs
-----

  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/bsp/CentralizedServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/BasicRPCCommunications.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerClientServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyWorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/SendVertexRequest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/ServerData.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/WorkerServer.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/BspServiceWorker.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GiraphJob.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/GraphMapper.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/DiskBackedPartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/HashWorkerPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/Partition.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/PartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/SimplePartitionStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/graph/partition/WorkerGraphPartitioner.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/TestMutateGraphVertex.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/comm/RequestTest.java 1373331 
  http://svn.apache.org/repos/asf/giraph/trunk/src/test/java/org/apache/giraph/graph/partition/TestPartitionStores.java PRE-CREATION 

Diff: https://reviews.apache.org/r/5987/diff/


Testing (updated)
-------

mvn verify and pseudo-distributed mode tests with both SimplePartitionStore and DiskBackedPartitionStore


Thanks,

Alessandro Presta