You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@giraph.apache.org by cl...@apache.org on 2013/05/08 00:53:33 UTC

[1/3] git commit: updated refs/heads/trunk to 5922de1

Updated Branches:
  refs/heads/trunk 04bf3d644 -> 5922de147


GIRAPH-622


Project: http://git-wip-us.apache.org/repos/asf/giraph/repo
Commit: http://git-wip-us.apache.org/repos/asf/giraph/commit/b053658a
Tree: http://git-wip-us.apache.org/repos/asf/giraph/tree/b053658a
Diff: http://git-wip-us.apache.org/repos/asf/giraph/diff/b053658a

Branch: refs/heads/trunk
Commit: b053658abefd676b703e7567595e4b8d89b765c4
Parents: 190974b
Author: Claudio Martella <cl...@apache.org>
Authored: Tue May 7 19:45:24 2013 +0200
Committer: Claudio Martella <cl...@apache.org>
Committed: Tue May 7 19:45:24 2013 +0200

----------------------------------------------------------------------
 src/site/site.xml     |    1 +
 src/site/xdoc/ooc.xml |   51 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+), 0 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/giraph/blob/b053658a/src/site/site.xml
----------------------------------------------------------------------
diff --git a/src/site/site.xml b/src/site/site.xml
index d87bb36..039a7e8 100644
--- a/src/site/site.xml
+++ b/src/site/site.xml
@@ -77,6 +77,7 @@
       <item name="Page Rank Example" href="pagerank.html"/>
       <item name="Input/Output in Giraph" href="io.html"/>
       <item name="Aggregators" href="aggregators.html"/>
+      <item name="Out-of-core" href="ooc.html"/>
       <item name="Javadoc" href="javadoc_modules.html"/>
       <item name="Presentations" href="presentations.html"/>
       <item name="External Community Wiki" href="https://cwiki.apache.org/confluence/display/GIRAPH" />

http://git-wip-us.apache.org/repos/asf/giraph/blob/b053658a/src/site/xdoc/ooc.xml
----------------------------------------------------------------------
diff --git a/src/site/xdoc/ooc.xml b/src/site/xdoc/ooc.xml
new file mode 100644
index 0000000..2887a0d
--- /dev/null
+++ b/src/site/xdoc/ooc.xml
@@ -0,0 +1,51 @@
+<?xml version="1.0" encoding="UTF-8"?>
+
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+<document xmlns="http://maven.apache.org/XDOC/2.0"
+	  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
+	  xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
+  <properties>
+    <title>Out-of-core Giraph</title>
+  </properties>
+  
+  <body>
+    <section name="Running Giraph with support of external memory">
+      <p>Like Pregel, Giraph was originally designed to run the whole computation in-memory, and to touch disks only for input/output and checkpointing. Unfortunately, sometimes a graph is too large to fit into a cluster's main memory (or the cluster is too small for a graph). Moreover, certain algorithms produce too large message sets (either producing many messages or very large ones), again not fitting into the cluster's memory. For these scenarios, Giraph was extended with out-of-core capabilities, meaning that Giraph is able to spill to disk excessive messages or partitions, according to user-defined parameters.
+      </p>
+    </section>
+    <section name="Out-of-core Graph">
+      <p>When running out-of-core graph, Giraph will keep only a limited number of partitions in memory, while the others will be stored to local disk(s), using an LRU policy to decide which partitions to swap. This feature can be enabled with parameter "giraph.useOutOfCoreGraph=true" (disabled by default), while the number of partitions is controlled by parameter "giraph.maxPartitionsInMemory=N" (with default value 10). The partitions will be kept in local disk(s) for fast i/o, and the destination directory is controlled by parameter "giraph.partitionsDirectory" (with default "_bsp/_partitions"). This is a fairly efficient feature, as all the operations produce sequential i/o, without random access to disk(s).
+      </p>
+      <p>For cluster with multiple disks per machine, the out-of-core partitions will be distributed across all the disks. This feature can be enabled by passing "giraph.partitionsDirectory" a comma-separated list of paths. The files will be distributed in a round-robin fashion.
+      </p>      
+      <p>For computations where the graph is static, meaning that no edges or vertices are added or removed during the computation (e.g. PageRank, SSSP, etc.), it is a waste of i/o operations to write back to disk the adjacency list of each vertex. With parameter "giraph.isStaticGraph=true" (disabled by default), when offloading a partition to disk, Giraph will write back only the vertices' values, saving the i/o produced by writing back also the the adjacency lists.
+      </p>
+    </section>
+    <section name="Out-of-core Messages">
+      <p>When running out-of-core messages, Giraph will keep only a limited number of messages in memory, while the others will be stored to local disk(s). This feature can be enabled with parameter "giraph.useOutOfCoreMessages=true" (disabled by default), while the number of messages is controlled by parameter "giraph.maxMessagesInMemory=N" (with default value 1000000). With this feature, Giraph will keep in memory the incoming messages into an in-memory store. When the store exceeds the chosen number of messages, the content of the store will be spilled to disk, and a new empty in-memory store will be instantiated. This process produces a number of files on disk, depending on the number of messages produced during a superstep. During the vertex computation the files will be read sequentially, and the messages for each vertex will be concatenated and fed to the vertex. Both for reading and writing, files are accessed sequentially.
+      </p>
+        <p>Also out-of-core messages can take advantage of multiple disks, as parameter "giraph.messagesDirectory" (with default "_bsp/_messages/") can accept a comma-separated list of paths. It is possible to control the buffers used for i/o with parameter "giraph.messagesBufferSize=#Bytes" (with default value 8192).
+      </p>
+    </section>
+      <p>It is difficult to decide a general policy to use out-of-core capabilities, as it depends on the behavior of the algorithm and the input graph. The exact number of partitions and messages to keep in memory depends on the cluster capabilities, the number of messages produced per superstep, and number of active vertices per superstep. Moreover, it depends on the type and size of vertex values and messages. For example, algorithms such as Belief Propagation tend to keep large vertex values, while algorithms such as clique computations tend to send large messages along. Hence, it depends on your algorithm what feature to rely on more. 
+      </p>
+  </body>
+</document>


[3/3] git commit: updated refs/heads/trunk to 5922de1

Posted by cl...@apache.org.
GIRAPH-622


Project: http://git-wip-us.apache.org/repos/asf/giraph/repo
Commit: http://git-wip-us.apache.org/repos/asf/giraph/commit/5922de14
Tree: http://git-wip-us.apache.org/repos/asf/giraph/tree/5922de14
Diff: http://git-wip-us.apache.org/repos/asf/giraph/diff/5922de14

Branch: refs/heads/trunk
Commit: 5922de1476454fa9e7462aabacb2f3f2dd3e162d
Parents: ff02c5d
Author: Claudio Martella <cl...@apache.org>
Authored: Wed May 8 00:52:55 2013 +0200
Committer: Claudio Martella <cl...@apache.org>
Committed: Wed May 8 00:52:55 2013 +0200

----------------------------------------------------------------------
 CHANGELOG |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/giraph/blob/5922de14/CHANGELOG
----------------------------------------------------------------------
diff --git a/CHANGELOG b/CHANGELOG
index c62c3dc..f828e78 100644
--- a/CHANGELOG
+++ b/CHANGELOG
@@ -1,6 +1,8 @@
 Giraph Change Log
 
 Release 1.1.0 - unreleased
+  GIRAPH-622: Website Documentation: Out-of-core (claudio)
+
   GIRAPH-659: giraph-hive tests all fail (majakabiljo)
 
   GIRAPH-663: Fix HiveIO metastore host setting (nitay)


[2/3] git commit: updated refs/heads/trunk to 5922de1

Posted by cl...@apache.org.
Merge branch 'trunk' of https://git-wip-us.apache.org/repos/asf/giraph into trunk


Project: http://git-wip-us.apache.org/repos/asf/giraph/repo
Commit: http://git-wip-us.apache.org/repos/asf/giraph/commit/ff02c5d6
Tree: http://git-wip-us.apache.org/repos/asf/giraph/tree/ff02c5d6
Diff: http://git-wip-us.apache.org/repos/asf/giraph/diff/ff02c5d6

Branch: refs/heads/trunk
Commit: ff02c5d6cf98c86f2aff5cf21206b1bdfabab4f9
Parents: b053658 04bf3d6
Author: Claudio Martella <cl...@apache.org>
Authored: Wed May 8 00:51:51 2013 +0200
Committer: Claudio Martella <cl...@apache.org>
Committed: Wed May 8 00:51:51 2013 +0200

----------------------------------------------------------------------
 CHANGELOG                                          |    4 ++++
 .../io/internal/WrappedVertexOutputFormat.java     |    8 ++++----
 .../giraph/hive/common/HiveInputOptions.java       |    2 +-
 pom.xml                                            |    3 +--
 4 files changed, 10 insertions(+), 7 deletions(-)
----------------------------------------------------------------------