You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Alessandro Presta (JIRA)" <ji...@apache.org> on 2012/08/15 18:42:38 UTC
[jira] [Updated] (GIRAPH-249) Move part of the graph out-of-core when memory is low

     [ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alessandro Presta updated GIRAPH-249:
-------------------------------------

    Attachment: GIRAPH-249.patch

I gave this another shot. This time it plays nicely with input superstep: I replaced both the temporary partitions and the worker partition map with a common data structure, PartitionStore, held in ServerData. This is similar to what we now have for messages.

Partition is now thread-safe so that we can have concurrent calls to putVertex().

SimplePartitionStore is backed by a concurrent hash map (nothing new here, except that we skip copying partitions to the worker).

DiskBackedPartitionStore can hold up to a user-defined number of partitions in memory, and spill the remaining ones to disk. Each partition is stored in a separate file.
Adding vertices to an out-of-core partition consists in appending them to the file, which makes processing vertex requests relatively fast.
We use a ReadWriteLock for each partition: performing operations on a partition held in memory only requires a read-lock (since Partition is thread-safe), while creating a new partition, moving it in and out of core or appending vertices requires a write-lock (we can't have concurrent writes).

Also note that this breaks Hadoop RPC: I preferred to keep it clean (this also shows what code we get rid of) instead of working around it. I suppose the Netty security patch will be completed soon. If not, I will restore RPC compatibility.

Here are some benchmarks that show where it can help:

First we run RandomMessageBenchmark with no messages (we are only interested in loading the graph) on trunk Giraph. We have 500k vertices, 1k edges per vertex, and 10 workers. Each JVM has 1.4G of total heap space according to the runtime.

{code}
hadoop jar giraph-trunk.jar org.apache.giraph.benchmark.RandomMessageBenchmark -Dgiraph.useNetty=true -v -V 500000 -e 1000 -n 0 -b 0 -w 10 -s 2
{code}

By increasing the frequency of logging, I was able to see that the workers crash (with no heap space) after loading roughly 30k vertices. My interpretation is that we read too many vertices that we can store on the client.

I then tried preventing the client from filling up by limiting the number of open requests to 10. In other words, we stop reading new vertices until we have less than 10 outgoing requests.

{code}
hadoop jar giraph-trunk.jar org.apache.giraph.benchmark.RandomMessageBenchmark -Dgiraph.useNetty=true -Dgiraph.waitForRequestsConfirmation=true -Dgiraph.maxNumberOfOpenRequests=10 -v -V 500000 -e 1000 -n 0 -b 0 -w 10 -s 2
{code}

This time we end up out of memory after loading 15k vertices per worker. This may be sign that we fill up the server with either incoming requests or temporary partitions.

Going out-of-core (with maximum 1 partition in memory), combined with the limit on open requests, does the trick:

{code}
hadoop jar giraph-249.jar org.apache.giraph.benchmark.RandomMessageBenchmark -Dgiraph.useNetty=true -Dgiraph.waitForRequestsConfirmation=true -Dgiraph.maxNumberOfOpenRequests=10 -Dgiraph.useOutOfCoreGraph=true -Dgiraph.maxPartitionsInMemory=1 -v -V 500000 -e 1000 -n 0 -b 0 -w 10 -s 2
{code}

This time the job completes, although quite slowly:

{code}
Total (milliseconds)	845,206
Setup (milliseconds)	2,857
Shutdown (milliseconds)	447
Vertex input superstep (milliseconds)	303,135
Superstep 0 (milliseconds)	177,757
Superstep 2 (milliseconds)	181,647
Superstep 1 (milliseconds)	179,360
{code}

As a performance test, I also ran an example that doesn't crash on trunk, by lowering the number of vertices to 100k.

{code}
hadoop jar giraph-trunk.jar org.apache.giraph.benchmark.RandomMessageBenchmark -Dgiraph.useNetty=true -v -V 100000 -e 1000 -n 0 -b 0 -w 10 -s 2
{code}

{code}
Total (milliseconds)	23,852
Setup (milliseconds)	2,186
Shutdown (milliseconds)	224
Vertex input superstep (milliseconds)	21,003
Superstep 0 (milliseconds)	176
Superstep 2 (milliseconds)	126
Superstep 1 (milliseconds)	132
{code}

Without going out-of-core (i.e. using SimplePartitionStore), the new version maintains the same performance:

{code}
hadoop jar giraph-249.jar org.apache.giraph.benchmark.RandomMessageBenchmark -Dgiraph.useNetty=true -v -V 100000 -e 1000 -n 0 -b 0 -w 10 -s 2
{code}

{code}
Total (milliseconds)	21,250
Setup (milliseconds)	2,897
Shutdown (milliseconds)	249
Vertex input superstep (milliseconds)	17,701
Superstep 0 (milliseconds)	153
Superstep 2 (milliseconds)	115
Superstep 1 (milliseconds)	131
{code}

I then tried going out-of-core to different extents, to measure the impact of reading from disk. This is with half of the graph out of core:

{code}
hadoop jar giraph-249.jar org.apache.giraph.benchmark.RandomMessageBenchmark -Dgiraph.useNetty=true -Dgiraph.useOutOfCoreGraph=true -Dgiraph.maxPartitionsInMemory=5 -v -V 100000 -e 1000 -n 0 -b 0 -w 10 -s 2
{code}

{code}
Total (milliseconds)	95,328
Setup (milliseconds)	3,297
Shutdown (milliseconds)	248
Vertex input superstep (milliseconds)	36,009
Superstep 0 (milliseconds)	18,888
Superstep 2 (milliseconds)	18,289
Superstep 1 (milliseconds)	18,592
{code}

The input superstep has a mere 2x hit, probably due to the append-only strategy. Loading the graph in a typical superstep is 2 orders of magnitude slower.

Let's try going completely out of core (i.e. max 1 partition in memory):

{code}
hadoop jar giraph-249.jar org.apache.giraph.benchmark.RandomMessageBenchmark -Dgiraph.useNetty=true -Dgiraph.useOutOfCoreGraph=true -Dgiraph.maxPartitionsInMemory=1 -v -V 100000 -e 1000 -n 0 -b 0 -w 10 -s 2
{code}

{code}
Total (milliseconds)	143,902
Setup (milliseconds)	1,825
Shutdown (milliseconds)	269
Vertex input superstep (milliseconds)	48,581
Superstep 0 (milliseconds)	31,217
Superstep 2 (milliseconds)	30,975
Superstep 1 (milliseconds)	31,032
{code}

Almost 2x slower than before, as expected.

Finally, I wanted to see how the gap gets mitigated when doing actual computations during the superstep. I ran PageRankBenchmark on the same graph, first with trunk:

{code}
hadoop jar giraph-trunk.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -v -V 100000 -e 1000 -w 10 -s 2
{code}

{code}
Total (milliseconds)	68,692
Setup (milliseconds)	2,758
Vertex input superstep (milliseconds)	17,971
Shutdown (milliseconds)	237
Superstep 0 (milliseconds)	17,270
Superstep 2 (milliseconds)	412
Superstep 1 (milliseconds)	30,040
{code}

Then with SimplePartitionStore:

{code}
hadoop jar giraph-249.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -v -V 100000 -e 1000 -w 10 -s 2
{code}

{code}
Total (milliseconds)	52,724
Setup (milliseconds)	2,611
Shutdown (milliseconds)	225
Vertex input superstep (milliseconds)	19,984
Superstep 0 (milliseconds)	13,177
Superstep 2 (milliseconds)	357
Superstep 1 (milliseconds)	16,365
{code}

Then with DiskBackedPartitionStore, moving half of the graph out of core:

{code}
hadoop jar giraph-249.jar org.apache.giraph.benchmark.PageRankBenchmark -Dgiraph.useNetty=true -Dgiraph.useOutOfCoreGraph=true -Dgiraph.maxPartitionsInMemory=5 -v -V 100000 -e 1000 -w 10 -s 2
{code}

{code}
Total (milliseconds)	171,175
Setup (milliseconds)	2,216
Shutdown (milliseconds)	3,250
Vertex input superstep (milliseconds)	37,287
Superstep 0 (milliseconds)	32,035
Superstep 2 (milliseconds)	42,578
Superstep 1 (milliseconds)	53,805
{code}

This is roughly a 3x performance hit overall. Note that superstep 2 terminates instantly when not going out-of-core (no vertex is activated), whereas the out-of-core version still has to fetch the partitions once to know that the vertices are halted.

                
> Move part of the graph out-of-core when memory is low
> -----------------------------------------------------
>
>                 Key: GIRAPH-249
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-249
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Alessandro Presta
>            Assignee: Alessandro Presta
>         Attachments: GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch, GIRAPH-249.patch
>
>
> There has been some talk about Giraph's scaling limitations due to keeping the whole graph and messages in RAM.
> We need to investigate methods to fall back to disk when running out of memory, while gracefully degrading performance.
> This issue is for graph storage. Messages should probably be a separate issue, although the interplay between the two is crucial.
> We should also discuss what are our primary goals here: completing a job (albeit slowly) instead of failing when the graph is too big, while still encouraging memory optimizations and high-memory clusters; or restructuring Giraph to be as efficient as possible in disk mode, making it almost a standard way of operating.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira