You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Alessandro Presta <al...@fb.com> on 2013/02/16 01:29:06 UTC

Re: Review Request: GIRAPH-515: GIRAPH-515: More efficient and flexible edge-based input

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/9449/
-----------------------------------------------------------

(Updated Feb. 16, 2013, 12:29 a.m.)


Review request for giraph.


Changes
-------

Addressed last comments.
Thanks Nitay and Maja for all the tips, it resulted in a much cleaner codebase.
Committing this.


Summary (updated)
-----------------

GIRAPH-515: GIRAPH-515: More efficient and flexible edge-based input


Description
-------

This patch adds the following classes:
- SendWorkerEdgesRequest: a request used to send edges during input superstep, similar to the corresponding one for messages
- SendEdgeCache: similar to SendMessageCache
- ByteArrayVertexIdEdges: serialized representation for lists of edges (for different source vertices), similar to the corresponding one for messages
- EdgeStore: a server-side structure that stores transient edges from incoming requests, and later moves them to the owning vertices.
- ByteArrayEdges: an edge list (for the same source vertex) stored as a byte-array. The standard way of iterating is by reusing Edge objects, but an alternative iterator that instantiates new objects is provided. Depending on the vertex implementation, we use one of the other.
This is a refactor of the byte-array code in RepresentativeVertex, which now contains an instance of ByteArrayEdges.
When calling setEdges(), RepresentativeVertex is smart to realize that the passed Iterable is actually an instance of ByteArrayEdges, and simply takes ownership of it (without iterating).
If using something like EdgeListVertex (which keeps references to the passed edges), we will use the alternative iterable (this is of course less memory-efficient).

I've also renamed RepresentativeVertex to ByteArrayVertex because it was misleading (it doesn't need to be used with ByteArrayPartition, it's perfectly fine to have multiple Vertex objects, each storing its edges in a byte-array).

Future work:

EdgeStore could become an interface in the future, allowing for different implementations (e.g. out-of-core) and handling permanent edge storage in place of Vertex. That way, we would have only one Vertex class, and pluggable storage implementations (which makes it easier to switch without changing user code).


This addresses bug GIRAPH-515.
    https://issues.apache.org/jira/browse/GIRAPH-515


Diffs (updated)
-----

  giraph-core/src/main/java/org/apache/giraph/benchmark/ByteArrayVertexPageRankBenchmark.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/benchmark/MultiGraphByteArrayVertexPageRankBenchmark.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 19b08bdb19df21b1dc56dad2cebb499222f9b19e 
  giraph-core/src/main/java/org/apache/giraph/comm/SendCache.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/SendEdgeCache.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java 3cbf0eb4775fa3ff0b0351f247df87783bf05995 
  giraph-core/src/main/java/org/apache/giraph/comm/ServerData.java 3655d79d8f249338da30ae2bb38b9cfd6b7b1f56 
  giraph-core/src/main/java/org/apache/giraph/comm/WorkerClientRequestProcessor.java 0c043e29ae3160bbfc389c435427cf57010a91e1 
  giraph-core/src/main/java/org/apache/giraph/comm/WorkerServer.java e60db5529b7fef0b16441ef88df7053d6856ffc5 
  giraph-core/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java 65caa5d2777b90fa8e14bee7c8d69316d512c651 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java d4e919ed1aa1f977a2e487531f57b3a2fc0fad47 
  giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1b7cc5410aa4d7e1b9ae4580dd5ed484e09ff7ed 
  giraph-core/src/main/java/org/apache/giraph/comm/requests/RequestType.java aac00289f915f61e61334cdcd92c93c1ef3b5419 
  giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerDataRequest.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerEdgesRequest.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java 641c795521006c460138d6b3b6d9ceb3c3e7eccf 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConfiguration.java 9e129efebe39c42bab9d59b3246055b79cdbdfa3 
  giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java 44d09c9462231874a9fed337215ac9fd650bb6d0 
  giraph-core/src/main/java/org/apache/giraph/conf/ImmutableClassesGiraphConfiguration.java 3e158afdc480656b3937508f5d86ec294bfa3b99 
  giraph-core/src/main/java/org/apache/giraph/graph/EdgeStore.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/partition/ByteArrayPartition.java 12989180a4aabed19c3aefa52ef38ad6d7aa6851 
  giraph-core/src/main/java/org/apache/giraph/partition/DiskBackedPartitionStore.java 725de39c4dfd2249a40203f62d93e9d0b246240b 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayEdges.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdData.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdEdges.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java dea4229f10224edb30f59626d5987ea840e8a271 
  giraph-core/src/main/java/org/apache/giraph/utils/VertexIdIterator.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/utils/WritableUtils.java fefe9a09b0570b8f6626243a2e51f386e18f2fe0 
  giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertex.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/vertex/ByteArrayVertexBase.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/vertex/EdgeListVertex.java 9ae692fc00432e28f0b87f11ed5981e600c95019 
  giraph-core/src/main/java/org/apache/giraph/vertex/MultiGraphByteArrayVertex.java PRE-CREATION 
  giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java fa3ab49f11d61352a5f6f69699375abd2bf1e527 
  giraph-core/src/main/java/org/apache/giraph/worker/EdgeInputSplitsCallable.java bdf9f5705811340748172a70dc952493d5ececfc 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 2845c90cbfd38f2f35e70e3b79767e1386d54a7e 
  giraph-core/src/test/java/org/apache/giraph/comm/RequestTest.java d779fe46377eaa8fa2debf0836f975a30ec6e21f 
  giraph-core/src/test/java/org/apache/giraph/utils/MockUtils.java 82dc2839d83f80ebcf52bad252886d50310eacc5 
  giraph-core/src/test/java/org/apache/giraph/vertex/TestMultiGraphVertex.java a5a3545de7dc9e30ab0f30926122049fdbe1173b 
  giraph-core/src/test/java/org/apache/giraph/vertex/TestMutableVertex.java ca4ba1a336f68b584c4fdbaf74be60dbe41644d5 

Diff: https://reviews.apache.org/r/9449/diff/


Testing
-------

mvn verify

Tested on both benchmarks and real-world applications.
This typically brings requirements down a lot: in an application using a few hundred billion edges, which previously only ran with 300 workers, we're now able to run with 100 workers, with a lot of memory to spare and even faster than before (from around 600s to 400s).


Thanks,

Alessandro Presta