You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by Maja Kabiljo <ma...@fb.com> on 2012/11/08 23:56:52 UTC

Review Request: Create BinaryCombiner ands specialized message store for it

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/
-----------------------------------------------------------

Review request for giraph.


Description
-------

Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.


This addresses bug GIRAPH-414.
    https://issues.apache.org/jira/browse/GIRAPH-414


Diffs
-----

  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 

Diff: https://reviews.apache.org/r/7975/diff/


Testing
-------

mvn verify

PageRankBenchmark
20m vertices, 100 edges per vertex, 20 workers
1 compute thread, superstep time 55s->45s
6 compute threads, superstep time 28s->15s
12 compute threads, 1 netty server thread, superstep time 185s->112s

Our internal application
Similar speedup as Page Rank


Thanks,

Maja Kabiljo


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13385
-----------------------------------------------------------

Ship it!


+1, thanks Maja!  Please make sure 'mvn clean install' works before your commit.

- Avery Ching


On Nov. 13, 2012, 4:40 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 13, 2012, 4:40 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1408573 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1408573 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/
-----------------------------------------------------------

(Updated Nov. 13, 2012, 4:40 a.m.)


Review request for giraph.


Changes
-------

Avery's comments


Description
-------

Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.


This addresses bug GIRAPH-414.
    https://issues.apache.org/jira/browse/GIRAPH-414


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1408573 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1408573 

Diff: https://reviews.apache.org/r/7975/diff/


Testing
-------

mvn verify

PageRankBenchmark
20m vertices, 100 edges per vertex, 20 workers
1 compute thread, superstep time 55s->45s
6 compute threads, superstep time 28s->15s
12 compute threads, 1 netty server thread, superstep time 185s->112s

Our internal application
Similar speedup as Page Rank


Thanks,

Maja Kabiljo


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13375
-----------------------------------------------------------


This is almost ready to go.  Looks great, although I will need to get rid of all your collection methods in the future (don't worry about it now) to support byte [] based messaging.


http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java
<https://reviews.apache.org/r/7975/#comment28657>

    Given the way we are using this, should we rename getNeutralMessage() to createInitialMessage()?



http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java
<https://reviews.apache.org/r/7975/#comment28664>

    Perhaps a bit more info would be nice
    
    "Using OneMessagePervertexStore since combiner enabled"
    
    



http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java
<https://reviews.apache.org/r/7975/#comment28651>

    I should have mentioned this earlier but 
    
    getNeutralMessage() sounds a bit strange.  


- Avery Ching


On Nov. 12, 2012, 6:39 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 12, 2012, 6:39 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/
-----------------------------------------------------------

(Updated Nov. 12, 2012, 6:39 p.m.)


Review request for giraph.


Changes
-------

This is with just one combiner type.


Description
-------

Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.


This addresses bug GIRAPH-414.
    https://issues.apache.org/jira/browse/GIRAPH-414


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 

Diff: https://reviews.apache.org/r/7975/diff/


Testing
-------

mvn verify

PageRankBenchmark
20m vertices, 100 edges per vertex, 20 workers
1 compute thread, superstep time 55s->45s
6 compute threads, superstep time 28s->15s
12 compute threads, 1 netty server thread, superstep time 185s->112s

Our internal application
Similar speedup as Page Rank


Thanks,

Maja Kabiljo


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Avery Ching <av...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13307
-----------------------------------------------------------


I don't have a super strong preference, but to me it seems like we should just have the BinaryCombiner.  There is a lot of code that is simplified if we only had one.

- Avery Ching


On Nov. 9, 2012, 10:34 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2012, 10:34 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/
-----------------------------------------------------------

(Updated Nov. 9, 2012, 10:34 p.m.)


Review request for giraph.


Changes
-------

Fixed the comments.


Description
-------

Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.


This addresses bug GIRAPH-414.
    https://issues.apache.org/jira/browse/GIRAPH-414


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 

Diff: https://reviews.apache.org/r/7975/diff/


Testing
-------

mvn verify

PageRankBenchmark
20m vertices, 100 edges per vertex, 20 workers
1 compute thread, superstep time 55s->45s
6 compute threads, superstep time 28s->15s
12 compute threads, 1 netty server thread, superstep time 185s->112s

Our internal application
Similar speedup as Page Rank


Thanks,

Maja Kabiljo


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13277
-----------------------------------------------------------


The empty interface solution works, but it may be confusing for the user.
If no one has a case for the old interface, I would say let's keep it simple and only have one kind of combiner.
Hopefully someone else will comment on this.

- Alessandro Presta


On Nov. 9, 2012, 1:44 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2012, 1:44 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Maja Kabiljo <ma...@fb.com>.

> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > The results look good. How does the no-combiner version fare in these same benchmarks?
> > 
> > Frankly, at this point I would just go for one combiner interface (the binary one).
> > No need to use it in the disk-backed message store (if we have one message per vertex, there is no point in spilling to disk).
> > 
> > The only use I can think of for the Iterable->Iterable form is when you're doing some sort of top-k computation, but I'm not sure if we would reap any benefits when the graph is sparse.
> > 
> > Regarding the interface, I think two sensible options are:
> > 1) M combine(M a, M b)
> > 2) M combine(Iterable<M> messages)
> > 
> > Why 2)? We may find that when processing a list of incoming messages, calling combine() for each of them has an overhead, whereas a loop inside combine() would be faster.
> > However, if we also use the binary combiner on the sender (which may be a good idea if it's fast enough), all the requests will have at most one message per vertex, so then 1) would be the way to go (assuming we call the combiner each time we process a request).

The way messaging is implemented right now, we don't have a list of messages per vertex. Please take a look at SendWorkerMessagesRequest. So there is no iterative combiner calls. 
This was discussed before, we get the chance to use the combiner on the client side very rarely, so benefit of having just lists instead of maps on the client side is much bigger then the one we could get from combiner.


> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java, line 56
> > <https://reviews.apache.org/r/7975/diff/1/?file=187405#file187405line56>
> >
> >     We could get rid of this method by using the following logic when processing messages:
> >     - if the current message is null, just set it to the incoming one
> >     - otherwise, combine them

The issue here is that we can't always modify the message which message store receives, because of local requests and sendMessageToAllEdges which keeps just one copy of message. 


> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java, line 47
> > <https://reviews.apache.org/r/7975/diff/1/?file=187405#file187405line47>
> >
> >     Do we have evidence that this in-place version is much better than the obvious M combine(M a, M b)? I understand the idea (minimizing object instantiation and garbage-collection), just want to make sure we make informed choices when going with a less-intuitive interface.
> >     
> >     I'm also suggesting that the vertex index is not needed. I see the parallel with the key in a MapReduce Reducer, but I'm not sure how it would help here to know just the vertex id but not the value.

I'll run some experiments, there are two steps which we are skipping by having this interface: object creation, and putting object back to the map.

Went with vertexId because it was in the original version. Does anyone have a reason why we might need it there?


> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java, line 35
> > <https://reviews.apache.org/r/7975/diff/1/?file=187397#file187397line35>
> >
> >     I find it confusing that we talk about "message store per vertex" when it isn't really a MessageStore.
> >     Also, I think here you mean "a single message" instead of "a collection of messages".

I'll revisit the comment


> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java, line 92
> > <https://reviews.apache.org/r/7975/diff/1/?file=187393#file187393line92>
> >
> >     Shouldn't you change the "vertex combiner" terminology to "multi combiner" across the board?

As I said in my comment, I haven't cleaned everything up because I am not sure what is the best way to deal with two combiner interfaces. Please take a look at the updated diff I posted. 


- Maja


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13268
-----------------------------------------------------------


On Nov. 9, 2012, 1:44 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2012, 1:44 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Alessandro Presta <al...@fb.com>.

> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > The results look good. How does the no-combiner version fare in these same benchmarks?
> > 
> > Frankly, at this point I would just go for one combiner interface (the binary one).
> > No need to use it in the disk-backed message store (if we have one message per vertex, there is no point in spilling to disk).
> > 
> > The only use I can think of for the Iterable->Iterable form is when you're doing some sort of top-k computation, but I'm not sure if we would reap any benefits when the graph is sparse.
> > 
> > Regarding the interface, I think two sensible options are:
> > 1) M combine(M a, M b)
> > 2) M combine(Iterable<M> messages)
> > 
> > Why 2)? We may find that when processing a list of incoming messages, calling combine() for each of them has an overhead, whereas a loop inside combine() would be faster.
> > However, if we also use the binary combiner on the sender (which may be a good idea if it's fast enough), all the requests will have at most one message per vertex, so then 1) would be the way to go (assuming we call the combiner each time we process a request).
> 
> Maja Kabiljo wrote:
>     The way messaging is implemented right now, we don't have a list of messages per vertex. Please take a look at SendWorkerMessagesRequest. So there is no iterative combiner calls. 
>     This was discussed before, we get the chance to use the combiner on the client side very rarely, so benefit of having just lists instead of maps on the client side is much bigger then the one we could get from combiner.
> 
> Maja Kabiljo wrote:
>     Forgot to mention, with PageRank no-combiner is slower than MultiCombiner. With our internal application for some reason no-combiner is slightly faster than MultiCombiner, but slower than BinaryCombiner.

Oh ok, I forgot that we switched to a list of (id, message) pairs on the client.


> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java, line 56
> > <https://reviews.apache.org/r/7975/diff/1/?file=187405#file187405line56>
> >
> >     We could get rid of this method by using the following logic when processing messages:
> >     - if the current message is null, just set it to the incoming one
> >     - otherwise, combine them
> 
> Maja Kabiljo wrote:
>     The issue here is that we can't always modify the message which message store receives, because of local requests and sendMessageToAllEdges which keeps just one copy of message.

Got it. We could make a copy of the first message then.


- Alessandro


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13268
-----------------------------------------------------------


On Nov. 9, 2012, 1:44 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2012, 1:44 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Maja Kabiljo <ma...@fb.com>.

> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java, line 47
> > <https://reviews.apache.org/r/7975/diff/1/?file=187405#file187405line47>
> >
> >     Do we have evidence that this in-place version is much better than the obvious M combine(M a, M b)? I understand the idea (minimizing object instantiation and garbage-collection), just want to make sure we make informed choices when going with a less-intuitive interface.
> >     
> >     I'm also suggesting that the vertex index is not needed. I see the parallel with the key in a MapReduce Reducer, but I'm not sure how it would help here to know just the vertex id but not the value.
> 
> Maja Kabiljo wrote:
>     I'll run some experiments, there are two steps which we are skipping by having this interface: object creation, and putting object back to the map.
>     
>     Went with vertexId because it was in the original version. Does anyone have a reason why we might need it there?

I tried one benchmark from above:
  6 compute threads, superstep time 28s->15s
and with M combine(M, M) interface it took 21s per superstep.


> On Nov. 9, 2012, 1:59 a.m., Alessandro Presta wrote:
> > http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java, line 56
> > <https://reviews.apache.org/r/7975/diff/1/?file=187405#file187405line56>
> >
> >     We could get rid of this method by using the following logic when processing messages:
> >     - if the current message is null, just set it to the incoming one
> >     - otherwise, combine them
> 
> Maja Kabiljo wrote:
>     The issue here is that we can't always modify the message which message store receives, because of local requests and sendMessageToAllEdges which keeps just one copy of message.
> 
> Alessandro Presta wrote:
>     Got it. We could make a copy of the first message then.

Do we have a way to do this? The only thing which I can think of is to serialize the message, and then create a new one and deserialize, but that's silly :-)


- Maja


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13268
-----------------------------------------------------------


On Nov. 9, 2012, 10:34 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2012, 10:34 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Alessandro Presta <al...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13268
-----------------------------------------------------------


The results look good. How does the no-combiner version fare in these same benchmarks?

Frankly, at this point I would just go for one combiner interface (the binary one).
No need to use it in the disk-backed message store (if we have one message per vertex, there is no point in spilling to disk).

The only use I can think of for the Iterable->Iterable form is when you're doing some sort of top-k computation, but I'm not sure if we would reap any benefits when the graph is sparse.

Regarding the interface, I think two sensible options are:
1) M combine(M a, M b)
2) M combine(Iterable<M> messages)

Why 2)? We may find that when processing a list of incoming messages, calling combine() for each of them has an overhead, whereas a loop inside combine() would be faster.
However, if we also use the binary combiner on the sender (which may be a good idea if it's fast enough), all the requests will have at most one message per vertex, so then 1) would be the way to go (assuming we call the combiner each time we process a request).


http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
<https://reviews.apache.org/r/7975/#comment28509>

    Shouldn't you change the "vertex combiner" terminology to "multi combiner" across the board?



http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java
<https://reviews.apache.org/r/7975/#comment28527>

    I find it confusing that we talk about "message store per vertex" when it isn't really a MessageStore.
    Also, I think here you mean "a single message" instead of "a collection of messages".



http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java
<https://reviews.apache.org/r/7975/#comment28530>

    This won't be needed if we do what I suggest in the other comment (don't run the combiner on the first message).



http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java
<https://reviews.apache.org/r/7975/#comment28516>

    Do we have evidence that this in-place version is much better than the obvious M combine(M a, M b)? I understand the idea (minimizing object instantiation and garbage-collection), just want to make sure we make informed choices when going with a less-intuitive interface.
    
    I'm also suggesting that the vertex index is not needed. I see the parallel with the key in a MapReduce Reducer, but I'm not sure how it would help here to know just the vertex id but not the value.



http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java
<https://reviews.apache.org/r/7975/#comment28526>

    We could get rid of this method by using the following logic when processing messages:
    - if the current message is null, just set it to the incoming one
    - otherwise, combine them


- Alessandro Presta


On Nov. 9, 2012, 1:44 a.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 9, 2012, 1:44 a.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/
-----------------------------------------------------------

(Updated Nov. 9, 2012, 1:44 a.m.)


Review request for giraph.


Changes
-------

Here is what I had in mind with creating an empty Combiner.


Description
-------

Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.


This addresses bug GIRAPH-414.
    https://issues.apache.org/jira/browse/GIRAPH-414


Diffs (updated)
-----

  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Combiner.java PRE-CREATION 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
  http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 

Diff: https://reviews.apache.org/r/7975/diff/


Testing
-------

mvn verify

PageRankBenchmark
20m vertices, 100 edges per vertex, 20 workers
1 compute thread, superstep time 55s->45s
6 compute threads, superstep time 28s->15s
12 compute threads, 1 netty server thread, superstep time 185s->112s

Our internal application
Similar speedup as Page Rank


Thanks,

Maja Kabiljo


Re: Review Request: Create BinaryCombiner ands specialized message store for it

Posted by Maja Kabiljo <ma...@fb.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7975/#review13263
-----------------------------------------------------------


This patch is not finished, one thing I wanted to hear your input on is what's the best way to deal with having two different combiner interfaces. 

I did it in a way that user just sets giraph.combinerClass to any of these types, but this creates an issue on a few places. For example we can't call stuff like conf.getClass(...) without knowing the interface of the combiner in advance. We can also have two different parameters to set, say giraph.binaryCombinerClass and giraph.multiCombinerClass, but having one parameter seems nicer.
Also places like InternalVertexRunner would need to change signatures to accept two kinds of combiner.

Does it makes sense to have some empty Combiner interface, and then BinaryCombiner and MultiCombiner both extend it, and we can use that interface on these problematic places?

I am also going to change all current combiners to BinaryCombiners. 

- Maja Kabiljo


On Nov. 8, 2012, 10:56 p.m., Maja Kabiljo wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7975/
> -----------------------------------------------------------
> 
> (Updated Nov. 8, 2012, 10:56 p.m.)
> 
> 
> Review request for giraph.
> 
> 
> Description
> -------
> 
> Current combiner interface is very general, but also doesn't provide the best performance. All the combiners we currently have are binary combiners, i.e. they can combine two messages into one. Having a lists around this simple concept makes it slower and requires more object creations.
> Adding BinaryCombiner, and a specialized message store which will be used with it. This message store has only one message per vertex instead of having a collection per vertex.
> 
> 
> This addresses bug GIRAPH-414.
>     https://issues.apache.org/jira/browse/GIRAPH-414
> 
> 
> Diffs
> -----
> 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/benchmark/PageRankBenchmark.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumBinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/DoubleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumDoubleCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/MinimumIntCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleSumCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BinaryCombiner.java PRE-CREATION 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/BspUtils.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/GiraphTypeValidator.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/VertexCombiner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/InternalVertexRunner.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/TestVertexTypes.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/ConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/SaslConnectionTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/examples/MinimumIntCombinerTest.java 1406748 
>   http://svn.apache.org/repos/asf/giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java 1406748 
> 
> Diff: https://reviews.apache.org/r/7975/diff/
> 
> 
> Testing
> -------
> 
> mvn verify
> 
> PageRankBenchmark
> 20m vertices, 100 edges per vertex, 20 workers
> 1 compute thread, superstep time 55s->45s
> 6 compute threads, superstep time 28s->15s
> 12 compute threads, 1 netty server thread, superstep time 185s->112s
> 
> Our internal application
> Similar speedup as Page Rank
> 
> 
> Thanks,
> 
> Maja Kabiljo
> 
>