You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@giraph.apache.org by "Maja Kabiljo (JIRA)" <ji...@apache.org> on 2012/10/05 17:22:04 UTC

[jira] [Created] (GIRAPH-357) Don't try to combine if there is only one message

Maja Kabiljo created GIRAPH-357:
-----------------------------------

             Summary: Don't try to combine if there is only one message
                 Key: GIRAPH-357
                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
             Project: Giraph
          Issue Type: Improvement
            Reporter: Maja Kabiljo


In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470493#comment-13470493 ] 

Avery Ching commented on GIRAPH-357:
------------------------------------

This is a great find.  I would suggest adding something to the below code to an option to use / not use a client side combiner.

{code}
  public SendMessageCache(ImmutableClassesGiraphConfiguration conf) {
    if (conf.getVertexCombinerClass() == null) {
      this.combiner = null;
    } else {
      this.combiner = conf.createVertexCombiner();
    }
  }
{code}
                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Eli Reisman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471936#comment-13471936 ] 

Eli Reisman commented on GIRAPH-357:
------------------------------------

I have not used the combining features too much. If you are combining messages at the sender side, the benefit should come only if the messages in SendMessageCache build up long enough to have enough queued to the same destination that its worth combining them, right? The more often you send a burst of cached messages, the less often the combiner is going to build up enough messages to actually have a few to combine that actually saves us some resources? And what sort of combining operation this is, and the nature of the messages getting into the cache that may or may not be combinable is different for different algorithms?

So, in my naive view of combining, it seems:

1. Running the combiner function on bundles of outgoing messages from the cache to a given worker might need to be tuned per-application?

2. Running it below some threshold of # of messages-per-outgoing-cache-bundle will always be silly/ineffecient, such as combining on every 1-messsage send. BTW: when do we ever (in the current form) send just one message at a time? It seems like this could only happen on the final flush of the cache at the end of a superstep?

So...would this be something we would tune with the "# of cached messages per-worker before flushing cache" GiraphConfiguration dash-D option, per application, rather than in code, assuming this algorithm needs a client-side combiner? If we always send X number of messages, the combiner should always have X or so to work with in the hopes of matching and reducing a few before serialization?

When you say "serialize" do you mean on the network, or spill to disk for later sending at the end of the superstep? I'm assuming the former? One of the things I have been very aware of during GIRAPH-328/322 is the fact that its one thing to carefully keep a single reference to something on the send side (for example) but its entirely another to innocently serialize it and end up with N unique copies at the far end of the deserialization. Is there some overarching idea here about how to minimize this? One thing I like about the idea (just the idea so far!) of 322 is that on both the client and recv sides, the original message reference can be shared without N copies being created during ser/deser.

                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Maja Kabiljo updated GIRAPH-357:
--------------------------------

    Attachment: GIRAPH-357.patch
    
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470394#comment-13470394 ] 

Maja Kabiljo commented on GIRAPH-357:
-------------------------------------

Also, since recreating these objects is that expensive, we should explore the option of not combining for every single new message, but in small batches. If I remember correctly Alessandro was looking into this at some point, can you please comment?
                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470913#comment-13470913 ] 

Maja Kabiljo commented on GIRAPH-357:
-------------------------------------

Avery, what are the benefits of disabling sender side combining at this point?

Yes, serializing right away is definitely worth exploring. If it gives significant speedup for one kind of applications but hurts others we can have it as an option. In current implementation, when we send the same message to several nodes we'll still do the serialization N times, so the only possible drawback is the total amount of memory we use at some point. 

I run some experiments with combining less often, but there was no speed change. I'll investigate a bit more, it seems weird to me that the change on sender gives speedup but on receiver doesn't.
                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13471064#comment-13471064 ] 

Avery Ching commented on GIRAPH-357:
------------------------------------

The only benefit compared to what you did is to avoid a few more checks (not much).  Maybe we can just add arguments to determine the number of messages to combine with (client / server)?  -1 indicates never combine?
                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Alessandro Presta (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470447#comment-13470447 ] 

Alessandro Presta commented on GIRAPH-357:
------------------------------------------

This change only helps when a lot of vertices have only one incoming message, right? Because this event (originalMessageCount = 0) only happens when the first message is received.

I remember playing a bit with that number (combining only every X messages) and not seeing any significant speedup, although intuition says we should.
If you have better evidence that it helps, we could even make the size configurable with a default of 10 or something. You could try running some benchmarks with different values and post the results.

                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Alessandro Presta (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470502#comment-13470502 ] 

Alessandro Presta commented on GIRAPH-357:
------------------------------------------

As per offline discussion, it would be interesting to evaluate the impact of combining on client, server or both.
We may find out, for example, that it's not worth it on the client, while on the server it may help to combine every X messages (just a hypothesis).

Also regarding the possibility of serializing straight away (which is doable if we drop combining), we have to be careful: it may give a speedup in some cases, but penalize us in some others since we can't reuse objects effectively.
To give a concrete example, in some label propagation implementations the message data contains a map that is sent to all neighbors. So all messages share that reference, as opposed to making copies of the map.
                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Avery Ching (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470511#comment-13470511 ] 

Avery Ching commented on GIRAPH-357:
------------------------------------

Agreed with everyone [~apresta] wrote.  Serializing straight away on the client is a good strategy to check out as you mentioned.
                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (GIRAPH-357) Don't try to combine if there is only one message

Posted by "Maja Kabiljo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/GIRAPH-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13470482#comment-13470482 ] 

Maja Kabiljo commented on GIRAPH-357:
-------------------------------------

Yes, it helps only when there is one message, but when you have large number of vertices that's almost always the case because of flushing.

I will try out not combining all the time and report back if I get any visible improvement.
                
> Don't try to combine if there is only one message
> -------------------------------------------------
>
>                 Key: GIRAPH-357
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-357
>             Project: Giraph
>          Issue Type: Improvement
>            Reporter: Maja Kabiljo
>            Assignee: Maja Kabiljo
>         Attachments: GIRAPH-357.patch
>
>
> In SendMessageCache, we call combiner even if we have just one message. Combining is kind of expensive since we recreate the message object and the list. With default settings and bigger graph, for PageRankBenchmark there is 10-15% superstep speedup if we don't call it when we have a single message.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira