You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by prasenjit mukherjee <pr...@gmail.com> on 2012/06/17 07:54:04 UTC

Some giraph implementation questions..

1. Is it the master who  ensures that sendToNeighbours() call actually
succeeds Or it is entirely done by the underlying  hadoop RPC ?

2. Is it possible that a single errant worker ( hadoop mapper ) can
delay the completion of a single superstep ?

3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
mean that in default config zookeper runs on master mapper ?

-Thanks,
Prasenjit

Re: Some giraph implementation questions..

Posted by Avery Ching <ac...@apache.org>.
On 6/19/12 6:01 PM, prasenjit mukherjee wrote:
> Ping!! Any suggestions/responses.
>
>
> On 6/17/12, prasenjit mukherjee<pr...@gmail.com>  wrote:
>> Thanks for the quick response. Additional questions...
>>
>> 1. It seems that the initial graph/vertices are loaded only during
>> startup/setup time. If I understand correctly further additions to
>> graph can be done only by implementing MutableVertex ?   Using
>> MutableVertex it should also be possible to take streaming data as
>> input and add new vertices/edges. Is that correct ?
Yeah, you could do this.  Generally, however, Giraph is more about batch 
processing than streaming.

>> 2. Is there a simpler way to debug Giraph code ( user+plaform ).  The
>> following approach (
>> http://ben-tech.blogspot.in/2011/08/how-to-debug-hadoop-mapreduce-jobs-in.html
>> ) does require a driver class to run. Are there any readymade utility
>> classes for debugging giraph in elcupse ?

Not that I know of.  But the author's approach can be used for Giraph 
(As it is just an MR job).  The tests can run with LocalJobRunner (for 
instance).

>> -Thanks,
>> Prasenjit
>>
>> On Sun, Jun 17, 2012 at 1:33 PM, Avery Ching<ac...@apache.org>  wrote:
>>> On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
>>>> 1. Is it the master who  ensures that sendToNeighbours() call actually
>>>> succeeds Or it is entirely done by the underlying  hadoop RPC ?
>>> Prior to the checkpoint, all messages must be guaranteed to be sent and
>>> delivered by all clients.
>>>
>>>
>>>> 2. Is it possible that a single errant worker ( hadoop mapper ) can
>>>> delay the completion of a single superstep ?
>>> Yes.  This is possible, especially in skewed distributions.
>>>
>>>
>>>> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
>>>> mean that in default config zookeper runs on master mapper ?
>>>
>>> Yes.  Currently, the master thread and zookeeper service runs on the same
>>> mapper.
>>>
>>>> -Thanks,
>>>> Prasenjit
>>>


Re: Some giraph implementation questions..

Posted by prasenjit mukherjee <pr...@gmail.com>.
Ping!! Any suggestions/responses.


On 6/17/12, prasenjit mukherjee <pr...@gmail.com> wrote:
> Thanks for the quick response. Additional questions...
>
> 1. It seems that the initial graph/vertices are loaded only during
> startup/setup time. If I understand correctly further additions to
> graph can be done only by implementing MutableVertex ?   Using
> MutableVertex it should also be possible to take streaming data as
> input and add new vertices/edges. Is that correct ?
>
> 2. Is there a simpler way to debug Giraph code ( user+plaform ).  The
> following approach (
> http://ben-tech.blogspot.in/2011/08/how-to-debug-hadoop-mapreduce-jobs-in.html
> ) does require a driver class to run. Are there any readymade utility
> classes for debugging giraph in elcupse ?
>
> -Thanks,
> Prasenjit
>
> On Sun, Jun 17, 2012 at 1:33 PM, Avery Ching <ac...@apache.org> wrote:
>> On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
>>>
>>> 1. Is it the master who  ensures that sendToNeighbours() call actually
>>> succeeds Or it is entirely done by the underlying  hadoop RPC ?
>>
>> Prior to the checkpoint, all messages must be guaranteed to be sent and
>> delivered by all clients.
>>
>>
>>> 2. Is it possible that a single errant worker ( hadoop mapper ) can
>>> delay the completion of a single superstep ?
>>
>> Yes.  This is possible, especially in skewed distributions.
>>
>>
>>> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
>>> mean that in default config zookeper runs on master mapper ?
>>
>>
>> Yes.  Currently, the master thread and zookeeper service runs on the same
>> mapper.
>>
>>> -Thanks,
>>> Prasenjit
>>
>>
>

-- 
Sent from my mobile device

Re: Some giraph implementation questions..

Posted by prasenjit mukherjee <pr...@gmail.com>.
Thanks for the quick response. Additional questions...

1. It seems that the initial graph/vertices are loaded only during
startup/setup time. If I understand correctly further additions to
graph can be done only by implementing MutableVertex ?   Using
MutableVertex it should also be possible to take streaming data as
input and add new vertices/edges. Is that correct ?

2. Is there a simpler way to debug Giraph code ( user+plaform ).  The
following approach (
http://ben-tech.blogspot.in/2011/08/how-to-debug-hadoop-mapreduce-jobs-in.html
) does require a driver class to run. Are there any readymade utility
classes for debugging giraph in elcupse ?

-Thanks,
Prasenjit

On Sun, Jun 17, 2012 at 1:33 PM, Avery Ching <ac...@apache.org> wrote:
> On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
>>
>> 1. Is it the master who  ensures that sendToNeighbours() call actually
>> succeeds Or it is entirely done by the underlying  hadoop RPC ?
>
> Prior to the checkpoint, all messages must be guaranteed to be sent and
> delivered by all clients.
>
>
>> 2. Is it possible that a single errant worker ( hadoop mapper ) can
>> delay the completion of a single superstep ?
>
> Yes.  This is possible, especially in skewed distributions.
>
>
>> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
>> mean that in default config zookeper runs on master mapper ?
>
>
> Yes.  Currently, the master thread and zookeeper service runs on the same
> mapper.
>
>> -Thanks,
>> Prasenjit
>
>

Re: Some giraph implementation questions..

Posted by Avery Ching <ac...@apache.org>.
On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
> 1. Is it the master who  ensures that sendToNeighbours() call actually
> succeeds Or it is entirely done by the underlying  hadoop RPC ?
Prior to the checkpoint, all messages must be guaranteed to be sent and 
delivered by all clients.

> 2. Is it possible that a single errant worker ( hadoop mapper ) can
> delay the completion of a single superstep ?
Yes.  This is possible, especially in skewed distributions.

> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
> mean that in default config zookeper runs on master mapper ?

Yes.  Currently, the master thread and zookeeper service runs on the 
same mapper.

> -Thanks,
> Prasenjit