You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by David Garcia <dg...@potomacfusion.com> on 2012/02/01 06:34:30 UTC

Caching (with LRU or something) strategy in Giraph?

I haven't investigated too deeply into this. . .but is there a caching strategy implemented, or in the works, for getting around having to load all of a split's vertices into memory?  If a graph is large enough, even a reasonably sized cluster may not have enough memory to load all the vertices.  Does Giraph address this currently?

-David

Re: Caching (with LRU or something) strategy in Giraph?

Posted by Jakob Homan <jg...@gmail.com>.

David-
   Please take a look at GIRAPH-45 for this discussion.
-Jakob


On Tue, Jan 31, 2012 at 9:34 PM, David Garcia <dg...@potomacfusion.com> wrote:
> I haven't investigated too deeply into this. . .but is there a caching
> strategy implemented, or in the works, for getting around having to load all
> of a split's vertices into memory?  If a graph is large enough, even a
> reasonably sized cluster may not have enough memory to load all the
> vertices.  Does Giraph address this currently?
>
> -David

Re: Caching (with LRU or something) strategy in Giraph?

Posted by David Garcia <dg...@potomacfusion.com>.

Hey Jake, thx for the reply.  I'll look at GIRAPH-45 for this particular topic.  Really quick though, I thought that Pregel was an implementation of BSP (a programming model. . .completely orthogonal from the manner in which data is retrieved/stored).  It seems quite reasonable to implement a basic caching strategy in the case all vertices don't fit in memory for a particular worker.  Thx again for your input.  I'll direct my question to GIRAPH-45 topic.

-David

From: Jake Mannix <ja...@gmail.com>>
Reply-To: "giraph-user@incubator.apache.org<ma...@incubator.apache.org>" <gi...@incubator.apache.org>>
Date: Wed, 1 Feb 2012 00:01:02 -0600
To: "giraph-user@incubator.apache.org<ma...@incubator.apache.org>" <gi...@incubator.apache.org>>
Subject: Re: Caching (with LRU or something) strategy in Giraph?

Hi David,

  The *point* of the Pregel architecture (which Giraph is an implementation of) is that the whole graph is in (distributed) memory.  If you are willing to go to disk, doing your calculations via MapReduce (possibly talking to a distributed hashtable of some kind colocated with your hadoop cluster, if it helps) is the straightforward way to go.

  -jake

On Tue, Jan 31, 2012 at 9:34 PM, David Garcia <dg...@potomacfusion.com>> wrote:
I haven't investigated too deeply into this. . .but is there a caching strategy implemented, or in the works, for getting around having to load all of a split's vertices into memory?  If a graph is large enough, even a reasonably sized cluster may not have enough memory to load all the vertices.  Does Giraph address this currently?

-David

Re: Caching (with LRU or something) strategy in Giraph?

Posted by Jake Mannix <ja...@gmail.com>.

Hi David,

  The *point* of the Pregel architecture (which Giraph is an implementation
of) is that the whole graph is in (distributed) memory.  If you are willing
to go to disk, doing your calculations via MapReduce (possibly talking to a
distributed hashtable of some kind colocated with your hadoop cluster, if
it helps) is the straightforward way to go.

  -jake

On Tue, Jan 31, 2012 at 9:34 PM, David Garcia <dg...@potomacfusion.com>wrote:

> I haven't investigated too deeply into this. . .but is there a caching
> strategy implemented, or in the works, for getting around having to load
> all of a split's vertices into memory?  If a graph is large enough, even a
> reasonably sized cluster may not have enough memory to load all the
> vertices.  Does Giraph address this currently?
>
> -David
>