You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Gavan Hood <gw...@simul-tech.com> on 2012/01/23 23:29:52 UTC

How is this use case supported

Hi all,

I have been wondering how Giraph can support a large graph that is constantly being updated by multiple jobs running simultaneously.
Output of  jobs are continually adding  extra and modifying edges / vertices  in the graph. Some notion of transactional concurrency would be needed as well in this environment.

>From what I can see it appears that Giraph may be well suited to working with snapshots of such as system rather than the root implementation, but I feel that I might be missing a core design pattern.

Regards
Gavan

Re: How is this use case supported

Posted by Avery Ching <ac...@apache.org>.

There was another question related to this in the recent past as well.

Avery

http://mail-archives.apache.org/mod_mbox/incubator-giraph-user/201201.mbox/%3CCAM7FQQutjTXVUiqALTTGpEad%3D7AE0MmQ2wmqzTEavHJUoohbMA%40mail.gmail.com%3E

On 1/23/12 3:10 PM, Gavan Hood wrote:
> Yes thanks Claudio, that is what my impression is as well. Thanks for the
> confirmation.
>
> -----Original Message-----
> From: Claudio Martella [mailto:claudio.martella@gmail.com]
> Sent: Tuesday, 24 January 2012 8:38 AM
> To: giraph-user@incubator.apache.org
> Subject: Re: How is this use case supported
>
> Hi Gavan,
>
> Giraph is a batch processing engine, no DB. What you would do is the same
> you would do with Mapreduce. As you said, you input a snapshot of your
> constantly changing graph to Giraph and work later with what's coming out in
> your pipeline. I personally I don't see space for transactions inside of
> Giraph, you'd have to manage it yourself from its output to update your DB.
>
> Does it  help?
>
> Best,
> Claudio
>
> On Mon, Jan 23, 2012 at 11:29 PM, Gavan Hood<gw...@simul-tech.com>  wrote:
>> Hi all,
>>
>> I have been wondering how Giraph can support a large graph that is
> constantly being updated by multiple jobs running simultaneously.
>> Output of  jobs are continually adding  extra and modifying edges /
> vertices  in the graph. Some notion of transactional concurrency would be
> needed as well in this environment.
>>  From what I can see it appears that Giraph may be well suited to working
> with snapshots of such as system rather than the root implementation, but I
> feel that I might be missing a core design pattern.
>> Regards
>> Gavan
>>
>>
>
>
> --
>     Claudio Martella
>     claudio.martella@gmail.com
>

RE: How is this use case supported

Posted by Gavan Hood <gw...@simul-tech.com>.

Yes thanks Claudio, that is what my impression is as well. Thanks for the
confirmation.

-----Original Message-----
From: Claudio Martella [mailto:claudio.martella@gmail.com] 
Sent: Tuesday, 24 January 2012 8:38 AM
To: giraph-user@incubator.apache.org
Subject: Re: How is this use case supported

Hi Gavan,

Giraph is a batch processing engine, no DB. What you would do is the same
you would do with Mapreduce. As you said, you input a snapshot of your
constantly changing graph to Giraph and work later with what's coming out in
your pipeline. I personally I don't see space for transactions inside of
Giraph, you'd have to manage it yourself from its output to update your DB.

Does it  help?

Best,
Claudio

On Mon, Jan 23, 2012 at 11:29 PM, Gavan Hood <gw...@simul-tech.com> wrote:
> Hi all,
>
> I have been wondering how Giraph can support a large graph that is
constantly being updated by multiple jobs running simultaneously.
> Output of  jobs are continually adding  extra and modifying edges /
vertices  in the graph. Some notion of transactional concurrency would be
needed as well in this environment.
>
> From what I can see it appears that Giraph may be well suited to working
with snapshots of such as system rather than the root implementation, but I
feel that I might be missing a core design pattern.
>
> Regards
> Gavan
>
>

--
   Claudio Martella
   claudio.martella@gmail.com

Re: How is this use case supported

Posted by Claudio Martella <cl...@gmail.com>.

Hi Gavan,

Giraph is a batch processing engine, no DB. What you would do is the
same you would do with Mapreduce. As you said, you input a snapshot of
your constantly changing graph to Giraph and work later with what's
coming out in your pipeline. I personally I don't see space for
transactions inside of Giraph, you'd have to manage it yourself from
its output to update your DB.

Does it  help?

Best,
Claudio

On Mon, Jan 23, 2012 at 11:29 PM, Gavan Hood <gw...@simul-tech.com> wrote:
> Hi all,
>
> I have been wondering how Giraph can support a large graph that is constantly being updated by multiple jobs running simultaneously.
> Output of  jobs are continually adding  extra and modifying edges / vertices  in the graph. Some notion of transactional concurrency would be needed as well in this environment.
>
> From what I can see it appears that Giraph may be well suited to working with snapshots of such as system rather than the root implementation, but I feel that I might be missing a core design pattern.
>
> Regards
> Gavan
>
>

-- 
   Claudio Martella
   claudio.martella@gmail.com