You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Claudio Martella (JIRA)" <ji...@apache.org> on 2013/02/03 15:26:12 UTC

[jira] [Commented] (GIRAPH-461) Convert static assignment of in-memory partitions with LRU cache

    [ https://issues.apache.org/jira/browse/GIRAPH-461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569779#comment-13569779 ] 

Claudio Martella commented on GIRAPH-461:
-----------------------------------------


hadoop jar giraph-0.2-SNAPSHOT-for-hadoop-1.0.2-jar-with-dependencies.jar org.apache.giraph.benchmark.PageRankBenchmark -w 60 -c 2 -e 100 -V 10000000 -v -s 10

trunk:
13/01/29 20:40:53 INFO mapred.JobClient:   Giraph Timers
13/01/29 20:40:53 INFO mapred.JobClient:     Total (milliseconds)=492403
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 3 (milliseconds)=40243
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 4 (milliseconds)=45430
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 10 (milliseconds)=713
13/01/29 20:40:53 INFO mapred.JobClient:     Setup (milliseconds)=20832
13/01/29 20:40:53 INFO mapred.JobClient:     Shutdown (milliseconds)=56
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 7 (milliseconds)=36753
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 9 (milliseconds)=36363
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 0 (milliseconds)=39558
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 8 (milliseconds)=44548
13/01/29 20:40:53 INFO mapred.JobClient:     Input superstep (milliseconds)=59184
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 6 (milliseconds)=40777
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 5 (milliseconds)=43962
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 2 (milliseconds)=37325
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep 1 (milliseconds)=46655
13/01/29 20:40:53 INFO mapred.JobClient:   Giraph Stats
13/01/29 20:40:53 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 20:40:53 INFO mapred.JobClient:     Superstep=11
13/01/29 20:40:53 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 20:40:53 INFO mapred.JobClient:     Current workers=60
13/01/29 20:40:53 INFO mapred.JobClient:     Current master task partition=0
13/01/29 20:40:53 INFO mapred.JobClient:     Sent messages=0
13/01/29 20:40:53 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 20:40:53 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 20:40:53 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 20:40:53 INFO mapred.JobClient:     Bytes Written=0
13/01/29 20:40:53 INFO mapred.JobClient:   FileSystemCounters
13/01/29 20:40:53 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 20:40:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 20:40:53 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 20:40:53 INFO mapred.JobClient:     Bytes Read=0
13/01/29 20:40:53 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 20:40:53 INFO mapred.JobClient:     Map input records=61
13/01/29 20:40:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71703965696
13/01/29 20:40:53 INFO mapred.JobClient:     Spilled Records=0
13/01/29 20:40:53 INFO mapred.JobClient:     CPU time spent (ms)=15141630
13/01/29 20:40:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=58151337984
13/01/29 20:40:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=371313995776
13/01/29 20:40:53 INFO mapred.JobClient:     Map output records=0
13/01/29 20:40:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

GIRAPH-461:
in memory:
13/01/29 19:35:53 INFO mapred.JobClient:   Giraph Timers
13/01/29 19:35:53 INFO mapred.JobClient:     Total (milliseconds)=427511
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 3 (milliseconds)=37341
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 4 (milliseconds)=35458
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 10 (milliseconds)=852
13/01/29 19:35:53 INFO mapred.JobClient:     Setup (milliseconds)=24825
13/01/29 19:35:53 INFO mapred.JobClient:     Shutdown (milliseconds)=50
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 7 (milliseconds)=37557
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 9 (milliseconds)=33961
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 0 (milliseconds)=33048
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 8 (milliseconds)=36345
13/01/29 19:35:53 INFO mapred.JobClient:     Input superstep (milliseconds)=44420
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 6 (milliseconds)=33635
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 5 (milliseconds)=41885
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 2 (milliseconds)=35046
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep 1 (milliseconds)=33083
13/01/29 19:35:53 INFO mapred.JobClient:   Giraph Stats
13/01/29 19:35:53 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 19:35:53 INFO mapred.JobClient:     Superstep=11
13/01/29 19:35:53 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 19:35:53 INFO mapred.JobClient:     Current workers=60
13/01/29 19:35:53 INFO mapred.JobClient:     Current master task partition=0
13/01/29 19:35:53 INFO mapred.JobClient:     Sent messages=0
13/01/29 19:35:53 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 19:35:53 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 19:35:53 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 19:35:53 INFO mapred.JobClient:     Bytes Written=0
13/01/29 19:35:53 INFO mapred.JobClient:   FileSystemCounters
13/01/29 19:35:53 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 19:35:53 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 19:35:53 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 19:35:53 INFO mapred.JobClient:     Bytes Read=0
13/01/29 19:35:53 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 19:35:53 INFO mapred.JobClient:     Map input records=61
13/01/29 19:35:53 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71627419648
13/01/29 19:35:53 INFO mapred.JobClient:     Spilled Records=0
13/01/29 19:35:53 INFO mapred.JobClient:     CPU time spent (ms)=15020990
13/01/29 19:35:53 INFO mapred.JobClient:     Total committed heap usage (bytes)=57611911168
13/01/29 19:35:53 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=371123154944
13/01/29 19:35:53 INFO mapred.JobClient:     Map output records=0
13/01/29 19:35:53 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

ooh graph (2 partitions in memory out of 49):
13/01/29 19:54:57 INFO mapred.JobClient:   Giraph Timers
13/01/29 19:54:57 INFO mapred.JobClient:     Total (milliseconds)=508004
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 3 (milliseconds)=38085
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 4 (milliseconds)=40789
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 10 (milliseconds)=811
13/01/29 19:54:57 INFO mapred.JobClient:     Setup (milliseconds)=25612
13/01/29 19:54:57 INFO mapred.JobClient:     Shutdown (milliseconds)=699
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 7 (milliseconds)=44806
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 9 (milliseconds)=41873
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 0 (milliseconds)=46329
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 8 (milliseconds)=46272
13/01/29 19:54:57 INFO mapred.JobClient:     Input superstep (milliseconds)=52395
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 6 (milliseconds)=44337
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 5 (milliseconds)=39379
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 2 (milliseconds)=40452
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep 1 (milliseconds)=46155
13/01/29 19:54:57 INFO mapred.JobClient:   Giraph Stats
13/01/29 19:54:57 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 19:54:57 INFO mapred.JobClient:     Superstep=11
13/01/29 19:54:57 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 19:54:57 INFO mapred.JobClient:     Current workers=60
13/01/29 19:54:57 INFO mapred.JobClient:     Current master task partition=0
13/01/29 19:54:57 INFO mapred.JobClient:     Sent messages=0
13/01/29 19:54:57 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 19:54:57 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 19:54:57 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 19:54:57 INFO mapred.JobClient:     Bytes Written=0
13/01/29 19:54:57 INFO mapred.JobClient:   FileSystemCounters
13/01/29 19:54:57 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 19:54:57 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 19:54:57 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 19:54:57 INFO mapred.JobClient:     Bytes Read=0
13/01/29 19:54:57 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 19:54:57 INFO mapred.JobClient:     Map input records=61
13/01/29 19:54:57 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71368736768
13/01/29 19:54:57 INFO mapred.JobClient:     Spilled Records=0
13/01/29 19:54:57 INFO mapred.JobClient:     CPU time spent (ms)=15289390
13/01/29 19:54:57 INFO mapred.JobClient:     Total committed heap usage (bytes)=57278595072
13/01/29 19:54:57 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=370911342592
13/01/29 19:54:57 INFO mapred.JobClient:     Map output records=0
13/01/29 19:54:57 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

in memory (2 compute threads per worker):
13/01/29 20:30:49 INFO mapred.JobClient:   Giraph Timers
13/01/29 20:30:49 INFO mapred.JobClient:     Total (milliseconds)=487379
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 3 (milliseconds)=46092
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 4 (milliseconds)=44840
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 10 (milliseconds)=745
13/01/29 20:30:49 INFO mapred.JobClient:     Setup (milliseconds)=23013
13/01/29 20:30:49 INFO mapred.JobClient:     Shutdown (milliseconds)=126
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 7 (milliseconds)=40620
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 9 (milliseconds)=39630
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 0 (milliseconds)=38221
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 8 (milliseconds)=40406
13/01/29 20:30:49 INFO mapred.JobClient:     Input superstep (milliseconds)=49762
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 6 (milliseconds)=45054
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 5 (milliseconds)=40220
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 2 (milliseconds)=40817
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep 1 (milliseconds)=37830
13/01/29 20:30:49 INFO mapred.JobClient:   Giraph Stats
13/01/29 20:30:49 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 20:30:49 INFO mapred.JobClient:     Superstep=11
13/01/29 20:30:49 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 20:30:49 INFO mapred.JobClient:     Current workers=60
13/01/29 20:30:49 INFO mapred.JobClient:     Current master task partition=0
13/01/29 20:30:49 INFO mapred.JobClient:     Sent messages=0
13/01/29 20:30:49 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 20:30:49 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 20:30:49 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 20:30:49 INFO mapred.JobClient:     Bytes Written=0
13/01/29 20:30:49 INFO mapred.JobClient:   FileSystemCounters
13/01/29 20:30:49 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 20:30:49 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 20:30:49 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 20:30:49 INFO mapred.JobClient:     Bytes Read=0
13/01/29 20:30:49 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 20:30:49 INFO mapred.JobClient:     Map input records=61
13/01/29 20:30:49 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71895678976
13/01/29 20:30:49 INFO mapred.JobClient:     Spilled Records=0
13/01/29 20:30:49 INFO mapred.JobClient:     CPU time spent (ms)=15134650
13/01/29 20:30:49 INFO mapred.JobClient:     Total committed heap usage (bytes)=57982255104
13/01/29 20:30:49 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=371448213504
13/01/29 20:30:49 INFO mapred.JobClient:     Map output records=0
13/01/29 20:30:49 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684

ooh graph (2 partitions in memory out of 49, 2 compute threads per worker):
13/01/29 20:11:28 INFO mapred.JobClient:   Giraph Timers
13/01/29 20:11:28 INFO mapred.JobClient:     Total (milliseconds)=506380
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 3 (milliseconds)=41677
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 4 (milliseconds)=41285
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 10 (milliseconds)=764
13/01/29 20:11:28 INFO mapred.JobClient:     Setup (milliseconds)=24574
13/01/29 20:11:28 INFO mapred.JobClient:     Shutdown (milliseconds)=82
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 7 (milliseconds)=43183
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 9 (milliseconds)=46654
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 0 (milliseconds)=50955
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 8 (milliseconds)=40413
13/01/29 20:11:28 INFO mapred.JobClient:     Input superstep (milliseconds)=43584
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 6 (milliseconds)=46638
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 5 (milliseconds)=46107
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 2 (milliseconds)=39321
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep 1 (milliseconds)=41139
13/01/29 20:11:28 INFO mapred.JobClient:   Giraph Stats
13/01/29 20:11:28 INFO mapred.JobClient:     Aggregate edges=1000000000
13/01/29 20:11:28 INFO mapred.JobClient:     Superstep=11
13/01/29 20:11:28 INFO mapred.JobClient:     Last checkpointed superstep=0
13/01/29 20:11:28 INFO mapred.JobClient:     Current workers=60
13/01/29 20:11:28 INFO mapred.JobClient:     Current master task partition=0
13/01/29 20:11:28 INFO mapred.JobClient:     Sent messages=0
13/01/29 20:11:28 INFO mapred.JobClient:     Aggregate finished vertices=10000000
13/01/29 20:11:28 INFO mapred.JobClient:     Aggregate vertices=10000000
13/01/29 20:11:28 INFO mapred.JobClient:   File Output Format Counters 
13/01/29 20:11:28 INFO mapred.JobClient:     Bytes Written=0
13/01/29 20:11:28 INFO mapred.JobClient:   FileSystemCounters
13/01/29 20:11:28 INFO mapred.JobClient:     HDFS_BYTES_READ=2684
13/01/29 20:11:28 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1388228
13/01/29 20:11:28 INFO mapred.JobClient:   File Input Format Counters 
13/01/29 20:11:28 INFO mapred.JobClient:     Bytes Read=0
13/01/29 20:11:28 INFO mapred.JobClient:   Map-Reduce Framework
13/01/29 20:11:28 INFO mapred.JobClient:     Map input records=61
13/01/29 20:11:28 INFO mapred.JobClient:     Physical memory (bytes) snapshot=71620620288
13/01/29 20:11:28 INFO mapred.JobClient:     Spilled Records=0
13/01/29 20:11:28 INFO mapred.JobClient:     CPU time spent (ms)=15279810
13/01/29 20:11:28 INFO mapred.JobClient:     Total committed heap usage (bytes)=57294782464
13/01/29 20:11:28 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=370988941312
13/01/29 20:11:28 INFO mapred.JobClient:     Map output records=0
13/01/29 20:11:28 INFO mapred.JobClient:     SPLIT_RAW_BYTES=2684




                
> Convert static assignment of in-memory partitions with LRU cache
> ----------------------------------------------------------------
>
>                 Key: GIRAPH-461
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-461
>             Project: Giraph
>          Issue Type: Sub-task
>          Components: graph
>            Reporter: Claudio Martella
>         Attachments: GIRAPH-461.patch, GIRAPH-461.patch
>
>
> Currently, the out-of-core partitions are assigned to memory or to disk statically. Using an LRU cache should help keeping in-memory only the partitions that are actively accessed, given a job that does not access all the graph at each superstep (traversals) and a good data partitioning (non random).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira