You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@giraph.apache.org by Matthew Saltz <sa...@gmail.com> on 2014/10/22 20:10:29 UTC

Multiple sendMessage calls vs. sendMessageToMultipleEdges

Hi everyone,

I have two questions:

*Question 1)* I'm using release 1.1.0 and I'm really confused about the
fact that I'm having massive performance differences in the following
scenario. I need to send one message from each vertex to a subset of its
neighbors (all that satisfy a certain condition). For that, I see two basic
options:

   a) Loop over all edges, making a call to sendMessage(source, target)
whenever target satisfies a condition I want, reusing the same IntWritable
for the target vertex by calling target.set(_)
   b) Loop over all edges, building up an ArrayList (or whatever) of
targets that satisfy the condition, and calling
sendMessageToMultipleMessages(targets) at the end.

Surprisingly, I get much, much worse performance using option (a), which I
would think would be much faster. So I looked in the code and eventually
found my way to SendMessageCache
<https://github.com/apache/giraph/blob/release-1.1/giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java>,
where it turns out that sendMessageToMultipleMessages ->
sendMessageToAllRequest(Iterator, Message) actually just loops over the
iterator, repeatedly calling sendMessageRequest (which is what I thought I
was doing in scenario (a). I might have incorrectly traced the code though.
Can anyone tell me what might be going on? I'm really puzzled by this.

*Question 2) *Is there a good way of sending a vertex's adjacency list to
its neighbors, without building up your own copy of an adjacency list and
then sending that? I'm going through the Edge iterable and building an
ArrayPrimitiveWritable of ids but it would be nice if I could somehow
access the underlying data structure behind the iterable or just wrap the
iterable as a writable somehow.

Thanks so much for the help,
Matthew Saltz

Re: Multiple sendMessage calls vs. sendMessageToMultipleEdges

Posted by Matthew Saltz <sa...@gmail.com>.
Actually,  one more question: are there any disadvantages to enabling
oneToAllMessaging? Is there any reason not to do it by default?

Best,
Matthew
El 22/10/2014 23:28, "Matthew Saltz" <sa...@gmail.com> escribió:

> Lukas,
>
> Thank you so much for the help. By 'the first class', you mean SendMessageToAllCache
> is not used unless I set the property to true, right? Because I actually do
> have giraph.oneToAllMsgSending=true, so if that means it's using
> SendMessageToAllCache  then everything makes much more sense. So I guess
> it makes sense then that case (b) that I mentioned that would be much
> faster than case (a)? I really appreciate it.  And do you have any ideas
> about the second question I asked? I think the answer is no but I'm kind of
> hoping it's not.
>
> Best,
> Matthew
>
>
>
> On Wed, Oct 22, 2014 at 11:16 PM, Lukas Nalezenec <
> lukas.nalezenec@firma.seznam.cz> wrote:
>
>>  Hi Matthew,
>>
>> See class SendMessageToAllCache. Its in the same directory as
>> SendMessageCache. The first class is not used by Giraph unless you set
>> property giraph.oneToAllMsgSending to true.
>>
>> Lukas
>>
>>
>> On 22.10.2014 20:10, Matthew Saltz wrote:
>>
>> Hi everyone,
>>
>> I have two questions:
>>
>>  *Question 1)* I'm using release 1.1.0 and I'm really confused about the
>> fact that I'm having massive performance differences in the following
>> scenario. I need to send one message from each vertex to a subset of its
>> neighbors (all that satisfy a certain condition). For that, I see two basic
>> options:
>>
>>     a) Loop over all edges, making a call to sendMessage(source, target)
>> whenever target satisfies a condition I want, reusing the same IntWritable
>> for the target vertex by calling target.set(_)
>>    b) Loop over all edges, building up an ArrayList (or whatever) of
>> targets that satisfy the condition, and calling
>> sendMessageToMultipleMessages(targets) at the end.
>>
>>  Surprisingly, I get much, much worse performance using option (a),
>> which I would think would be much faster. So I looked in the code and
>> eventually found my way to SendMessageCache
>> <https://github.com/apache/giraph/blob/release-1.1/giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java>,
>> where it turns out that sendMessageToMultipleMessages ->
>> sendMessageToAllRequest(Iterator, Message) actually just loops over the
>> iterator, repeatedly calling sendMessageRequest (which is what I thought I
>> was doing in scenario (a). I might have incorrectly traced the code though.
>> Can anyone tell me what might be going on? I'm really puzzled by this.
>>
>>  *Question 2) *Is there a good way of sending a vertex's adjacency list
>> to its neighbors, without building up your own copy of an adjacency list
>> and then sending that? I'm going through the Edge iterable and building an
>> ArrayPrimitiveWritable of ids but it would be nice if I could somehow
>> access the underlying data structure behind the iterable or just wrap the
>> iterable as a writable somehow.
>>
>>  Thanks so much for the help,
>> Matthew Saltz
>>
>>
>>
>>
>>
>

Re: Multiple sendMessage calls vs. sendMessageToMultipleEdges

Posted by Matthew Saltz <sa...@gmail.com>.
Lukas,

Thank you so much for the help. By 'the first class', you mean
SendMessageToAllCache
is not used unless I set the property to true, right? Because I actually do
have giraph.oneToAllMsgSending=true, so if that means it's using
SendMessageToAllCache  then everything makes much more sense. So I guess it
makes sense then that case (b) that I mentioned that would be much faster
than case (a)? I really appreciate it.  And do you have any ideas about the
second question I asked? I think the answer is no but I'm kind of hoping
it's not.

Best,
Matthew



On Wed, Oct 22, 2014 at 11:16 PM, Lukas Nalezenec <
lukas.nalezenec@firma.seznam.cz> wrote:

>  Hi Matthew,
>
> See class SendMessageToAllCache. Its in the same directory as
> SendMessageCache. The first class is not used by Giraph unless you set
> property giraph.oneToAllMsgSending to true.
>
> Lukas
>
>
> On 22.10.2014 20:10, Matthew Saltz wrote:
>
> Hi everyone,
>
> I have two questions:
>
>  *Question 1)* I'm using release 1.1.0 and I'm really confused about the
> fact that I'm having massive performance differences in the following
> scenario. I need to send one message from each vertex to a subset of its
> neighbors (all that satisfy a certain condition). For that, I see two basic
> options:
>
>     a) Loop over all edges, making a call to sendMessage(source, target)
> whenever target satisfies a condition I want, reusing the same IntWritable
> for the target vertex by calling target.set(_)
>    b) Loop over all edges, building up an ArrayList (or whatever) of
> targets that satisfy the condition, and calling
> sendMessageToMultipleMessages(targets) at the end.
>
>  Surprisingly, I get much, much worse performance using option (a), which
> I would think would be much faster. So I looked in the code and eventually
> found my way to SendMessageCache
> <https://github.com/apache/giraph/blob/release-1.1/giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java>,
> where it turns out that sendMessageToMultipleMessages ->
> sendMessageToAllRequest(Iterator, Message) actually just loops over the
> iterator, repeatedly calling sendMessageRequest (which is what I thought I
> was doing in scenario (a). I might have incorrectly traced the code though.
> Can anyone tell me what might be going on? I'm really puzzled by this.
>
>  *Question 2) *Is there a good way of sending a vertex's adjacency list
> to its neighbors, without building up your own copy of an adjacency list
> and then sending that? I'm going through the Edge iterable and building an
> ArrayPrimitiveWritable of ids but it would be nice if I could somehow
> access the underlying data structure behind the iterable or just wrap the
> iterable as a writable somehow.
>
>  Thanks so much for the help,
> Matthew Saltz
>
>
>
>
>

Re: Multiple sendMessage calls vs. sendMessageToMultipleEdges

Posted by Lukas Nalezenec <lu...@firma.seznam.cz>.
Hi Matthew,

See class SendMessageToAllCache. Its in the same directory as 
SendMessageCache. The first class is not used by Giraph unless you set 
property giraph.oneToAllMsgSending to true.

Lukas

On 22.10.2014 20:10, Matthew Saltz wrote:
> Hi everyone,
>
> I have two questions:
>
> *Question 1)* I'm using release 1.1.0 and I'm really confused about 
> the fact that I'm having massive performance differences in the 
> following scenario. I need to send one message from each vertex to a 
> subset of its neighbors (all that satisfy a certain condition). For 
> that, I see two basic options:
>
>    a) Loop over all edges, making a call to sendMessage(source, 
> target) whenever target satisfies a condition I want, reusing the same 
> IntWritable for the target vertex by calling target.set(_)
>    b) Loop over all edges, building up an ArrayList (or whatever) of 
> targets that satisfy the condition, and calling 
> sendMessageToMultipleMessages(targets) at the end.
>
> Surprisingly, I get much, much worse performance using option (a), 
> which I would think would be much faster. So I looked in the code and 
> eventually found my way to SendMessageCache 
> <https://github.com/apache/giraph/blob/release-1.1/giraph-core/src/main/java/org/apache/giraph/comm/SendMessageCache.java>, 
> where it turns out that sendMessageToMultipleMessages -> 
> sendMessageToAllRequest(Iterator, Message) actually just loops over 
> the iterator, repeatedly calling sendMessageRequest (which is what I 
> thought I was doing in scenario (a). I might have incorrectly traced 
> the code though. Can anyone tell me what might be going on? I'm really 
> puzzled by this.
>
> *Question 2) *Is there a good way of sending a vertex's adjacency list 
> to its neighbors, without building up your own copy of an adjacency 
> list and then sending that? I'm going through the Edge iterable and 
> building an ArrayPrimitiveWritable of ids but it would be nice if I 
> could somehow access the underlying data structure behind the iterable 
> or just wrap the iterable as a writable somehow.
>
> Thanks so much for the help,
> Matthew Saltz
>
>
>