You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by prasenjit mukherjee <pr...@gmail.com> on 2012/06/17 07:54:04 UTC
Some giraph implementation questions..
1. Is it the master who ensures that sendToNeighbours() call actually
succeeds Or it is entirely done by the underlying hadoop RPC ?
2. Is it possible that a single errant worker ( hadoop mapper ) can
delay the completion of a single superstep ?
3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
mean that in default config zookeper runs on master mapper ?
-Thanks,
Prasenjit
Re: Some giraph implementation questions..
Posted by Avery Ching <ac...@apache.org>.
On 6/19/12 6:01 PM, prasenjit mukherjee wrote:
> Ping!! Any suggestions/responses.
>
>
> On 6/17/12, prasenjit mukherjee<pr...@gmail.com> wrote:
>> Thanks for the quick response. Additional questions...
>>
>> 1. It seems that the initial graph/vertices are loaded only during
>> startup/setup time. If I understand correctly further additions to
>> graph can be done only by implementing MutableVertex ? Using
>> MutableVertex it should also be possible to take streaming data as
>> input and add new vertices/edges. Is that correct ?
Yeah, you could do this. Generally, however, Giraph is more about batch
processing than streaming.
>> 2. Is there a simpler way to debug Giraph code ( user+plaform ). The
>> following approach (
>> http://ben-tech.blogspot.in/2011/08/how-to-debug-hadoop-mapreduce-jobs-in.html
>> ) does require a driver class to run. Are there any readymade utility
>> classes for debugging giraph in elcupse ?
Not that I know of. But the author's approach can be used for Giraph
(As it is just an MR job). The tests can run with LocalJobRunner (for
instance).
>> -Thanks,
>> Prasenjit
>>
>> On Sun, Jun 17, 2012 at 1:33 PM, Avery Ching<ac...@apache.org> wrote:
>>> On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
>>>> 1. Is it the master who ensures that sendToNeighbours() call actually
>>>> succeeds Or it is entirely done by the underlying hadoop RPC ?
>>> Prior to the checkpoint, all messages must be guaranteed to be sent and
>>> delivered by all clients.
>>>
>>>
>>>> 2. Is it possible that a single errant worker ( hadoop mapper ) can
>>>> delay the completion of a single superstep ?
>>> Yes. This is possible, especially in skewed distributions.
>>>
>>>
>>>> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
>>>> mean that in default config zookeper runs on master mapper ?
>>>
>>> Yes. Currently, the master thread and zookeeper service runs on the same
>>> mapper.
>>>
>>>> -Thanks,
>>>> Prasenjit
>>>
Re: Some giraph implementation questions..
Posted by prasenjit mukherjee <pr...@gmail.com>.
Ping!! Any suggestions/responses.
On 6/17/12, prasenjit mukherjee <pr...@gmail.com> wrote:
> Thanks for the quick response. Additional questions...
>
> 1. It seems that the initial graph/vertices are loaded only during
> startup/setup time. If I understand correctly further additions to
> graph can be done only by implementing MutableVertex ? Using
> MutableVertex it should also be possible to take streaming data as
> input and add new vertices/edges. Is that correct ?
>
> 2. Is there a simpler way to debug Giraph code ( user+plaform ). The
> following approach (
> http://ben-tech.blogspot.in/2011/08/how-to-debug-hadoop-mapreduce-jobs-in.html
> ) does require a driver class to run. Are there any readymade utility
> classes for debugging giraph in elcupse ?
>
> -Thanks,
> Prasenjit
>
> On Sun, Jun 17, 2012 at 1:33 PM, Avery Ching <ac...@apache.org> wrote:
>> On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
>>>
>>> 1. Is it the master who ensures that sendToNeighbours() call actually
>>> succeeds Or it is entirely done by the underlying hadoop RPC ?
>>
>> Prior to the checkpoint, all messages must be guaranteed to be sent and
>> delivered by all clients.
>>
>>
>>> 2. Is it possible that a single errant worker ( hadoop mapper ) can
>>> delay the completion of a single superstep ?
>>
>> Yes. This is possible, especially in skewed distributions.
>>
>>
>>> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
>>> mean that in default config zookeper runs on master mapper ?
>>
>>
>> Yes. Currently, the master thread and zookeeper service runs on the same
>> mapper.
>>
>>> -Thanks,
>>> Prasenjit
>>
>>
>
--
Sent from my mobile device
Re: Some giraph implementation questions..
Posted by prasenjit mukherjee <pr...@gmail.com>.
Thanks for the quick response. Additional questions...
1. It seems that the initial graph/vertices are loaded only during
startup/setup time. If I understand correctly further additions to
graph can be done only by implementing MutableVertex ? Using
MutableVertex it should also be possible to take streaming data as
input and add new vertices/edges. Is that correct ?
2. Is there a simpler way to debug Giraph code ( user+plaform ). The
following approach (
http://ben-tech.blogspot.in/2011/08/how-to-debug-hadoop-mapreduce-jobs-in.html
) does require a driver class to run. Are there any readymade utility
classes for debugging giraph in elcupse ?
-Thanks,
Prasenjit
On Sun, Jun 17, 2012 at 1:33 PM, Avery Ching <ac...@apache.org> wrote:
> On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
>>
>> 1. Is it the master who ensures that sendToNeighbours() call actually
>> succeeds Or it is entirely done by the underlying hadoop RPC ?
>
> Prior to the checkpoint, all messages must be guaranteed to be sent and
> delivered by all clients.
>
>
>> 2. Is it possible that a single errant worker ( hadoop mapper ) can
>> delay the completion of a single superstep ?
>
> Yes. This is possible, especially in skewed distributions.
>
>
>> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
>> mean that in default config zookeper runs on master mapper ?
>
>
> Yes. Currently, the master thread and zookeeper service runs on the same
> mapper.
>
>> -Thanks,
>> Prasenjit
>
>
Re: Some giraph implementation questions..
Posted by Avery Ching <ac...@apache.org>.
On 6/16/12 10:54 PM, prasenjit mukherjee wrote:
> 1. Is it the master who ensures that sendToNeighbours() call actually
> succeeds Or it is entirely done by the underlying hadoop RPC ?
Prior to the checkpoint, all messages must be guaranteed to be sent and
delivered by all clients.
> 2. Is it possible that a single errant worker ( hadoop mapper ) can
> delay the completion of a single superstep ?
Yes. This is possible, especially in skewed distributions.
> 3. While running giraph I saw 1 master_zookeper and 3 workers. Does it
> mean that in default config zookeper runs on master mapper ?
Yes. Currently, the master thread and zookeeper service runs on the
same mapper.
> -Thanks,
> Prasenjit