You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@giraph.apache.org by Vincentius Martin <vi...@gmail.com> on 2014/11/10 05:00:11 UTC

When do Giraph vertices receive their messages?

I am curious about how does Giraph receive messages before processing it

I know that they use their accepted messages in the compute() method on the
next superstep, but when do they receive it? If it is before the checkpoint
process, is there any part in the documentation/code that I can see to
understand it?

Also, what mechanism that Giraph use to store messages before superstep
S+1? Are they store it in a buffer or disk first?

I still cannot find anything about this.

Regards,
Vincentius Martin

Re: When do Giraph vertices receive their messages?

Posted by Vincentius Martin <vi...@gmail.com>.

I have checked, what I found is just like what you said.

Senders will flush everything at the end of superstep.They put all of those
messages from remote partitions, with their partition IDs, in the server
data's incoming message store. Then, the messages will be added to the
buffer and flushed when the buffer is full.

Hence, from what I get here, it seems that the transfer process itself is
done at the end of the superstep. Please tell me if there are any incorrect
parts from what I wrote.

Therefore, I still don't get it when the documentation says that the
workers will receive everything at the beginning of the next superstep. I
still can't find anything in the code and logs that indicate the
workers/vertices are receiving the messages.

What I know are just, at the beginning of each superstep, master will
assign the partitions to each worker and workers do vertex partitions
exchange.


Regards,
Vincentius Martin

On Mon, Nov 10, 2014 at 6:46 PM, Matthew Saltz <sa...@gmail.com> wrote:

> Hi Vincentius,
>
> I'd recommend checking out the code in the call() method of this class
> <https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/graph/ComputeCallable.java> to
> try to follow the logic that occurs during computation in a superstep, as
> well as the code
> <https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java>
> for handling message sending and the execute method in GraphTaskManager
> <https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java> which
> basically handles the overall control flow of everything. I've found that
> for Giraph at some point you're going to more or less need to dig through
> the code to figure out what's going on behind the scenes.  Looking at the
> call() method and computePartition() methods in ComputeCallable are pretty
> enlightening.  As far as messaging goes it appears that everything is
> flushed from the sender before the end of the superstep.
>
> Someone else please correct me if I'm wrong about any of these things; I
> don't want to mislead anyone.
>
> Best,
> Matthew Saltz
>
>
>
> On Mon, Nov 10, 2014 at 12:18 PM, Vincentius Martin <
> vincentiusmartin@gmail.com> wrote:
>
>> Hi XingFeng, thanks for your answer!
>>
>> Yes, I have already read Pregel paper, unfortunately there are some
>> specific steps that I still couldn't grasp.
>>
>> Therefore, when does the checkpoint happen? Is it before/after the step 1
>> (the receiving messages phase) in your explanation?
>>
>> Also, according to your explanation, I can deduce that at the beginning
>> of each superstep, the messages are still in the sender workers' buffer and
>> each of the sender workers will send them at this phase. Am I right?
>>
>>
>> Regards,
>> Vincentius Martin
>>
>> On Mon, Nov 10, 2014 at 5:49 PM, XingFENG <xi...@cse.unsw.edu.au>
>> wrote:
>>
>>> Hi Vincentius Martin,
>>>
>>> Since Giraph is based on Pregel, I would refer you to the paper *Pregel:
>>> A System for Large-Scale Graph Processing *for more details.
>>>
>>> Briefly speaking, in each superstep,
>>> 1. a worker (which is responsible for a partition of vertices) receives
>>> messages from others. A worker then divided these messages according to the
>>> destID and active vertices which have incoming messages.
>>> 2. a worker runs *compute* function of each active vertex. Meanwhile,
>>> the *compute* function may generate messages to other vertices. These
>>> messages are buffered, combined and sent in batches in an asynchronous way.
>>> 3. after a worker finishes *compute* function of all active vertex, it
>>> waits for all other workers finishing their *compute* functions. What
>>> is more, it waits for all sending tasks to finish to ensure all messages
>>> can be received in next superstep. Then every worker goes into next
>>> superstep.
>>>
>>> For your second problem, messages are stored in a buffer.
>>>
>>> On Mon, Nov 10, 2014 at 6:14 PM, Puneet Agarwal <pu...@yahoo.com>
>>> wrote:
>>>
>>>> These are some very interesting questions. I also would like to know
>>>> the answers to these.
>>>>
>>>> - Puneet
>>>> IIT Delhi, India
>>>>
>>>>
>>>>   On Monday, November 10, 2014 9:30 AM, Vincentius Martin <
>>>> vincentiusmartin@gmail.com> wrote:
>>>>
>>>>
>>>> I am curious about how does Giraph receive messages before processing it
>>>> I know that they use their accepted messages in the compute() method
>>>> on the next superstep, but when do they receive it? If it is before the
>>>> checkpoint process, is there any part in the documentation/code that I can
>>>> see to understand it?
>>>> Also, what mechanism that Giraph use to store messages before superstep
>>>> S+1? Are they store it in a buffer or disk first?
>>>> I still cannot find anything about this.
>>>>
>>>> Regards,
>>>> Vincentius Martin
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Best Regards.
>>> ---
>>> Xing FENG
>>> PhD Candidate
>>> Database Research Group
>>>
>>> School of Computer Science and Engineering
>>> University of New South Wales
>>> NSW 2052, Sydney
>>>
>>> Phone: (+61) 413 857 288
>>>
>>
>>
>

Re: When do Giraph vertices receive their messages?

Posted by Matthew Saltz <sa...@gmail.com>.

Hi Vincentius,

I'd recommend checking out the code in the call() method of this class
<https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/graph/ComputeCallable.java>
to
try to follow the logic that occurs during computation in a superstep, as
well as the code
<https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyWorkerClientRequestProcessor.java>
for handling message sending and the execute method in GraphTaskManager
<https://github.com/apache/giraph/blob/trunk/giraph-core/src/main/java/org/apache/giraph/graph/GraphTaskManager.java>
which
basically handles the overall control flow of everything. I've found that
for Giraph at some point you're going to more or less need to dig through
the code to figure out what's going on behind the scenes.  Looking at the
call() method and computePartition() methods in ComputeCallable are pretty
enlightening.  As far as messaging goes it appears that everything is
flushed from the sender before the end of the superstep.

Someone else please correct me if I'm wrong about any of these things; I
don't want to mislead anyone.

Best,
Matthew Saltz



On Mon, Nov 10, 2014 at 12:18 PM, Vincentius Martin <
vincentiusmartin@gmail.com> wrote:

> Hi XingFeng, thanks for your answer!
>
> Yes, I have already read Pregel paper, unfortunately there are some
> specific steps that I still couldn't grasp.
>
> Therefore, when does the checkpoint happen? Is it before/after the step 1
> (the receiving messages phase) in your explanation?
>
> Also, according to your explanation, I can deduce that at the beginning of
> each superstep, the messages are still in the sender workers' buffer and
> each of the sender workers will send them at this phase. Am I right?
>
>
> Regards,
> Vincentius Martin
>
> On Mon, Nov 10, 2014 at 5:49 PM, XingFENG <xi...@cse.unsw.edu.au>
> wrote:
>
>> Hi Vincentius Martin,
>>
>> Since Giraph is based on Pregel, I would refer you to the paper *Pregel:
>> A System for Large-Scale Graph Processing *for more details.
>>
>> Briefly speaking, in each superstep,
>> 1. a worker (which is responsible for a partition of vertices) receives
>> messages from others. A worker then divided these messages according to the
>> destID and active vertices which have incoming messages.
>> 2. a worker runs *compute* function of each active vertex. Meanwhile,
>> the *compute* function may generate messages to other vertices. These
>> messages are buffered, combined and sent in batches in an asynchronous way.
>> 3. after a worker finishes *compute* function of all active vertex, it
>> waits for all other workers finishing their *compute* functions. What is
>> more, it waits for all sending tasks to finish to ensure all messages can
>> be received in next superstep. Then every worker goes into next superstep.
>>
>> For your second problem, messages are stored in a buffer.
>>
>> On Mon, Nov 10, 2014 at 6:14 PM, Puneet Agarwal <pu...@yahoo.com>
>> wrote:
>>
>>> These are some very interesting questions. I also would like to know the
>>> answers to these.
>>>
>>> - Puneet
>>> IIT Delhi, India
>>>
>>>
>>>   On Monday, November 10, 2014 9:30 AM, Vincentius Martin <
>>> vincentiusmartin@gmail.com> wrote:
>>>
>>>
>>> I am curious about how does Giraph receive messages before processing it
>>> I know that they use their accepted messages in the compute() method on
>>> the next superstep, but when do they receive it? If it is before the
>>> checkpoint process, is there any part in the documentation/code that I can
>>> see to understand it?
>>> Also, what mechanism that Giraph use to store messages before superstep
>>> S+1? Are they store it in a buffer or disk first?
>>> I still cannot find anything about this.
>>>
>>> Regards,
>>> Vincentius Martin
>>>
>>>
>>>
>>
>>
>> --
>> Best Regards.
>> ---
>> Xing FENG
>> PhD Candidate
>> Database Research Group
>>
>> School of Computer Science and Engineering
>> University of New South Wales
>> NSW 2052, Sydney
>>
>> Phone: (+61) 413 857 288
>>
>
>

Re: When do Giraph vertices receive their messages?

Posted by Vincentius Martin <vi...@gmail.com>.

Hi XingFeng, thanks for your answer!

Yes, I have already read Pregel paper, unfortunately there are some
specific steps that I still couldn't grasp.

Therefore, when does the checkpoint happen? Is it before/after the step 1
(the receiving messages phase) in your explanation?

Also, according to your explanation, I can deduce that at the beginning of
each superstep, the messages are still in the sender workers' buffer and
each of the sender workers will send them at this phase. Am I right?


Regards,
Vincentius Martin

On Mon, Nov 10, 2014 at 5:49 PM, XingFENG <xi...@cse.unsw.edu.au> wrote:

> Hi Vincentius Martin,
>
> Since Giraph is based on Pregel, I would refer you to the paper *Pregel:
> A System for Large-Scale Graph Processing *for more details.
>
> Briefly speaking, in each superstep,
> 1. a worker (which is responsible for a partition of vertices) receives
> messages from others. A worker then divided these messages according to the
> destID and active vertices which have incoming messages.
> 2. a worker runs *compute* function of each active vertex. Meanwhile, the
> *compute* function may generate messages to other vertices. These
> messages are buffered, combined and sent in batches in an asynchronous way.
> 3. after a worker finishes *compute* function of all active vertex, it
> waits for all other workers finishing their *compute* functions. What is
> more, it waits for all sending tasks to finish to ensure all messages can
> be received in next superstep. Then every worker goes into next superstep.
>
> For your second problem, messages are stored in a buffer.
>
> On Mon, Nov 10, 2014 at 6:14 PM, Puneet Agarwal <pu...@yahoo.com>
> wrote:
>
>> These are some very interesting questions. I also would like to know the
>> answers to these.
>>
>> - Puneet
>> IIT Delhi, India
>>
>>
>>   On Monday, November 10, 2014 9:30 AM, Vincentius Martin <
>> vincentiusmartin@gmail.com> wrote:
>>
>>
>> I am curious about how does Giraph receive messages before processing it
>> I know that they use their accepted messages in the compute() method on
>> the next superstep, but when do they receive it? If it is before the
>> checkpoint process, is there any part in the documentation/code that I can
>> see to understand it?
>> Also, what mechanism that Giraph use to store messages before superstep
>> S+1? Are they store it in a buffer or disk first?
>> I still cannot find anything about this.
>>
>> Regards,
>> Vincentius Martin
>>
>>
>>
>
>
> --
> Best Regards.
> ---
> Xing FENG
> PhD Candidate
> Database Research Group
>
> School of Computer Science and Engineering
> University of New South Wales
> NSW 2052, Sydney
>
> Phone: (+61) 413 857 288
>

Re: When do Giraph vertices receive their messages?

Posted by XingFENG <xi...@cse.unsw.edu.au>.

Hi Vincentius Martin,

Since Giraph is based on Pregel, I would refer you to the paper *Pregel: A
System for Large-Scale Graph Processing *for more details.

Briefly speaking, in each superstep,
1. a worker (which is responsible for a partition of vertices) receives
messages from others. A worker then divided these messages according to the
destID and active vertices which have incoming messages.
2. a worker runs *compute* function of each active vertex. Meanwhile, the
*compute* function may generate messages to other vertices. These messages
are buffered, combined and sent in batches in an asynchronous way.
3. after a worker finishes *compute* function of all active vertex, it
waits for all other workers finishing their *compute* functions. What is
more, it waits for all sending tasks to finish to ensure all messages can
be received in next superstep. Then every worker goes into next superstep.

For your second problem, messages are stored in a buffer.

On Mon, Nov 10, 2014 at 6:14 PM, Puneet Agarwal <pu...@yahoo.com> wrote:

> These are some very interesting questions. I also would like to know the
> answers to these.
>
> - Puneet
> IIT Delhi, India
>
>
>   On Monday, November 10, 2014 9:30 AM, Vincentius Martin <
> vincentiusmartin@gmail.com> wrote:
>
>
> I am curious about how does Giraph receive messages before processing it
> I know that they use their accepted messages in the compute() method on
> the next superstep, but when do they receive it? If it is before the
> checkpoint process, is there any part in the documentation/code that I can
> see to understand it?
> Also, what mechanism that Giraph use to store messages before superstep
> S+1? Are they store it in a buffer or disk first?
> I still cannot find anything about this.
>
> Regards,
> Vincentius Martin
>
>
>

-- 
Best Regards.
---
Xing FENG
PhD Candidate
Database Research Group

School of Computer Science and Engineering
University of New South Wales
NSW 2052, Sydney

Phone: (+61) 413 857 288

Re: When do Giraph vertices receive their messages?

Posted by Puneet Agarwal <pu...@yahoo.com>.

These are some very interesting questions. I also would like to know the answers to these.
- PuneetIIT Delhi, India
 

     On Monday, November 10, 2014 9:30 AM, Vincentius Martin <vi...@gmail.com> wrote:
   

 I am curious about how does Giraph receive messages before processing it
I know that they use their accepted messages in the compute() method on the next superstep, but when do they receive it? If it is before the checkpoint process, is there any part in the documentation/code that I can see to understand it?Also, what mechanism that Giraph use to store messages before superstep S+1? Are they store it in a buffer or disk first?I still cannot find anything about this. 
Regards,
Vincentius Martin