You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Avery Ching (JIRA)" <ji...@apache.org> on 2012/11/20 10:34:58 UTC
[jira] [Created] (GIRAPH-435) Serialize server messages for memory
and less GC
Avery Ching created GIRAPH-435:
----------------------------------
Summary: Serialize server messages for memory and less GC
Key: GIRAPH-435
URL: https://issues.apache.org/jira/browse/GIRAPH-435
Project: Giraph
Issue Type: Bug
Reporter: Avery Ching
Assignee: Avery Ching
Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-435) Serialize server messages for memory
and less GC
Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avery Ching updated GIRAPH-435:
-------------------------------
Attachment: GIRAPH-435.3.patch
Updated to latest.
> Serialize server messages for memory and less GC
> ------------------------------------------------
>
> Key: GIRAPH-435
> URL: https://issues.apache.org/jira/browse/GIRAPH-435
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-435.3.patch, GIRAPH-435.patch
>
>
> Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-435) Serialize server messages for
memory and less GC
Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503998#comment-13503998 ]
Avery Ching commented on GIRAPH-435:
------------------------------------
Almost there, just working out an issue with FlushableMessageStore.
> Serialize server messages for memory and less GC
> ------------------------------------------------
>
> Key: GIRAPH-435
> URL: https://issues.apache.org/jira/browse/GIRAPH-435
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-435.patch
>
>
> Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-435) Serialize server messages for memory
and less GC
Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avery Ching updated GIRAPH-435:
-------------------------------
Attachment: GIRAPH-435.patch
> Serialize server messages for memory and less GC
> ------------------------------------------------
>
> Key: GIRAPH-435
> URL: https://issues.apache.org/jira/browse/GIRAPH-435
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-435.patch
>
>
> Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (GIRAPH-435) Serialize server messages for memory
and less GC
Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Avery Ching updated GIRAPH-435:
-------------------------------
Attachment: GIRAPH-435.4.patch
Final patch committed.
> Serialize server messages for memory and less GC
> ------------------------------------------------
>
> Key: GIRAPH-435
> URL: https://issues.apache.org/jira/browse/GIRAPH-435
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-435.3.patch, GIRAPH-435.4.patch, GIRAPH-435.patch
>
>
> Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-435) Serialize server messages for
memory and less GC
Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13500965#comment-13500965 ]
Avery Ching commented on GIRAPH-435:
------------------------------------
https://reviews.apache.org/r/8140/
This patch serializes messages on the server into byte[] and also reduces the amount of objects being created by copying a ByteArrayVertexIdMessages object to a MessageStore and using a representative message when doing computation of the vertices.
- Added new helper classes ByteArrayIterable, ByteArrayIterator, RepresentativeByteArrayIterable, and RepresentativeByteArrayIterator to help create objects only when necessary
- Removed all collection methods from SimpleMessageStore and its children
- ByteArrayMessagesPerVertexStore stores messages on the server in serialized byte[] through ExtendedDataOutput (replaces CollectionOfMessagesPerVertexStore, which used objects)
- Iterators from ByteArrayVertexIdMessages are specialized to reduce the number of objects created (only when necessary)
- Allow the choice of message serialization with encoded message size (better for complex message objects) or just the message serialized (better for simple messages)
Results: 10 workers, 2.5 B edges, 10 m vertices, page rank
When not using the combiner, compute time on superstep 0 dropped from 12.05 -> 10.11, other supersteps improved accordingly as well. Free memory after superstep 2 improved from 26727.61 M -> 53102.54 M. GC time went down from 1,213.502 -> 518.597 (more than half). Overall application time dropped from 353.085 -> 244.434 seconds.
When using the combiner, there are still some modest gains due to the reduction in objects created (35 -> 29 seconds for superstep 0).
Trunk (no combiner):
INFO 2012-11-16 17:57:35,036 [compute-23] org.apache.giraph.graph.ComputeCallable - call: Computation took 12.054391 secs for 1 partitions on superstep 0. Flushing started
INFO 2012-11-16 17:57:35,081 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 61302.61231994629M
INFO 2012-11-16 17:59:04,322 [compute-13] org.apache.giraph.graph.ComputeCallable - call: Computation took 12.514889 secs for 1 partitions on superstep 1. Flushing started
INFO 2012-11-16 17:59:04,369 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 34318.67655181885M
INFO 2012-11-16 18:00:29,559 [compute-10] org.apache.giraph.graph.ComputeCallable - call: Computation took 11.869164 secs for 1 partitions on superstep 2. Flushing started
INFO 2012-11-16 18:00:29,602 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 2 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 26727.610847473145M
INFO 2012-11-16 18:01:45,788 [compute-5] org.apache.giraph.graph.ComputeCallable - call: Computation took 1.5657606 secs for 1 partitions on superstep 3. Flushing started
INFO 2012-11-16 18:01:45,789 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 3 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 38338.96612548828M
Giraph Timers Total (milliseconds) 353,085 0 353,085
Superstep 3 (milliseconds) 4,694 0 4,694
Setup (milliseconds) 2,354 0 2,354
Vertex input superstep (milliseconds) 71,891 0 71,891
Shutdown (milliseconds) 15,104 0 15,104
Superstep 0 (milliseconds) 85,516 0 85,516
Superstep 2 (milliseconds) 87,720 0 87,720
Superstep 1 (milliseconds) 85,803 0 85,803
Max heap size after GC in bytes 405,574,586,784 0 405,574,586,784
Total number of GC 1,823 0 1,823
Total time of GC in milliseconds 1,213,502 0 1,213,502
Heap size after last GC in bytes 398,867,750,728 0 398,867,750,728
GIRAPH-435 (no combiner)
INFO 2012-11-19 01:31:02,712 [compute-0] org.apache.giraph.graph.ComputeCallable - call: Computation took 10.109899 secs for 1 partitions on superstep 0. Flushing started
INFO 2012-11-19 01:31:02,754 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 0 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 63432.56399536133M
INFO 2012-11-19 01:31:55,448 [compute-7] org.apache.giraph.graph.ComputeCallable - call: Computation took 9.036797 secs for 1 partitions on superstep 1. Flushing started
INFO 2012-11-19 01:31:55,491 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 1 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 59648.34732055664M
INFO 2012-11-19 01:32:52,595 [compute-9] org.apache.giraph.graph.ComputeCallable - call: Computation took 8.786798 secs for 1 partitions on superstep 2. Flushing started
INFO 2012-11-19 01:32:52,782 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 2 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 53102.54949951172M
INFO 2012-11-19 01:33:40,104 [compute-15] org.apache.giraph.graph.ComputeCallable - call: Computation took 1.481184 secs for 1 partitions on superstep 3. Flushing started
INFO 2012-11-19 01:33:40,105 [main] org.apache.giraph.graph.BspServiceWorker - finishSuperstep: Waiting on all requests, superstep 3 totalMem = 81728.6875M, maxMem = 81728.6875M, freeMem = 65556.08256530762M
Total (milliseconds) 244,434 0 244,434
Superstep 3 (milliseconds) 4,865 0 4,865
oSetup (milliseconds) 2,210 0 2,210
Vertex input superstep (milliseconds) 73,466 0 73,466
Shutdown (milliseconds) 60 0 60
Superstep 0 (milliseconds) 51,097 0 51,097
Superstep 2 (milliseconds) 55,029 0 55,029
Superstep 1 (milliseconds) 57,704 0 57,704
Max heap size after GC in bytes 241,391,753,216 0 241,391,753,216
Total number of GC 1,614 0 1,614
Total time of GC in milliseconds 518,597 0 518,597
Heap size after last GC in bytes 211,137,611,752 0 211,137,611,752
Trunk (with combiner):
Total (milliseconds) 190,247 0 190,247
Superstep 3 (milliseconds) 4,441 0 4,441
Setup (milliseconds) 2,761 0 2,761
Vertex input superstep (milliseconds) 73,147 0 73,147
Shutdown (milliseconds) 359 0 359
Superstep 0 (milliseconds) 35,376 0 35,376
Superstep 2 (milliseconds) 35,896 0 35,896
Superstep 1 (milliseconds) 38,262 0 38,262
Max heap size after GC in bytes 182,641,373,552 0 182,641,373,552
Total number of GC 1,722 0 1,722
Total time of GC in milliseconds 418,019 0 418,019
Heap size after last GC in bytes 167,224,437,896 0 167,224,437,896
GIRAPH-435 (with combiner):
Total (milliseconds) 175,022 0 175,022
Superstep 3 (milliseconds) 4,716 0 4,716
Setup (milliseconds) 3,070 0 3,070
Vertex input superstep (milliseconds) 72,903 0 72,903
Shutdown (milliseconds) 208 0 208
Superstep 0 (milliseconds) 29,271 0 29,271
Superstep 2 (milliseconds) 31,667 0 31,667
Superstep 1 (milliseconds) 33,183 0 33,183
Max heap size after GC in bytes 176,287,403,976 0 176,287,403,976
Total number of GC 1,493 0 1,493
Total time of GC in milliseconds 354,427 0 354,427
Heap size after last GC in bytes 154,542,562,320 0 154,542,562,320
Passed unittests, page rank benchmark on a real cluster, internal applications showed nice improvement as well.
> Serialize server messages for memory and less GC
> ------------------------------------------------
>
> Key: GIRAPH-435
> URL: https://issues.apache.org/jira/browse/GIRAPH-435
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-435.patch
>
>
> Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-435) Serialize server messages for
memory and less GC
Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503861#comment-13503861 ]
Eli Reisman commented on GIRAPH-435:
------------------------------------
Is there a block on this or are we good to go on a commit?
> Serialize server messages for memory and less GC
> ------------------------------------------------
>
> Key: GIRAPH-435
> URL: https://issues.apache.org/jira/browse/GIRAPH-435
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-435.patch
>
>
> Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (GIRAPH-435) Serialize server messages for
memory and less GC
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/GIRAPH-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504996#comment-13504996 ]
Hudson commented on GIRAPH-435:
-------------------------------
Integrated in Giraph-trunk-Commit #296 (See [https://builds.apache.org/job/Giraph-trunk-Commit/296/])
GIRAPH-435: Serialize server messages for memory and less GC. (aching) (Revision 1414361)
Result = SUCCESS
aching : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1414361
Files :
* /giraph/trunk/CHANGELOG
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/GiraphConfiguration.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/ImmutableClassesGiraphConfiguration.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/SendMessageCache.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/BasicMessageStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/ByteArrayMessagesPerVertexStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/CollectionOfMessagesPerVertexStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/DiskBackedMessageStoreByPartition.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/MessageStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/MessageStoreByPartition.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/OneMessagePerVertexStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SequentialFileMessageStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/messages/SimpleMessageStore.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyClient.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/netty/NettyWorkerServer.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendPartitionCurrentMessagesRequest.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/comm/requests/SendWorkerMessagesRequest.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/examples/SimpleMsgVertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/ComputeCallable.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/graph/Vertex.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayIterable.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayIterator.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessageCollection.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ByteArrayVertexIdMessages.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/CollectionUtils.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/EmptyIterable.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedByteArrayDataOutput.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/ExtendedDataOutput.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterable.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/RepresentativeByteArrayIterator.java
* /giraph/trunk/giraph/src/main/java/org/apache/giraph/utils/UnsafeByteArrayOutputStream.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestFailureTest.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/RequestTest.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/comm/TestMessageStores.java
* /giraph/trunk/giraph/src/test/java/org/apache/giraph/utils/MockUtils.java
> Serialize server messages for memory and less GC
> ------------------------------------------------
>
> Key: GIRAPH-435
> URL: https://issues.apache.org/jira/browse/GIRAPH-435
> Project: Giraph
> Issue Type: Bug
> Reporter: Avery Ching
> Assignee: Avery Ching
> Attachments: GIRAPH-435.3.patch, GIRAPH-435.4.patch, GIRAPH-435.patch
>
>
> Keeping all messages as live objects on the server has a lot of GC overhead and also takes up a lot of memory.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira