You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@giraph.apache.org by Claudio Martella <cl...@gmail.com> on 2012/01/09 13:29:01 UTC

on the semantics of the combiner

Hello list,

for GIRAPH-45 I'm touching the incoming messages and hit an
interesting problem with the combiner semantics.
currently, my code fails testBspCombiner for the following reason:

SimpleSumCombiner::compute() returns a value even if there are no
messages in the iterator (in this case it returns 0) and for this
reason the vertices get activated at each superstep.

At each superstep, under-the-hood, I pass the combiner for each vertex
an Iterable, which can be empty:

    public Iterable<M> getMessages(I vertexId) {
    	Iterable<M> messages = inMessages.getMessages(vertexId);
    	if (combiner != null) {
    		M combinedMsg;
    		try {
    			combinedMsg = combiner.combine(vertexId, messages);
    		}  catch (IOException e) {
    			throw new RuntimeException("could not combine", e);
    		}
    		if (combinedMsg != null) {
    			List<M> tmp = new ArrayList<M>(1);
    			tmp.add(combinedMsg);
    			messages = tmp;
    		} else {
    			messages = new ArrayList<M>(0);
    		}
    	}
    	return messages;
    }

the Iterable returned by this methods is passed to
basicVertex.putMessages() right before the compute().
Now, the question is: who's wrong? The combiner code that returns a
sum of 0 over no values, or the framework that calls the combiner with
0 messages?



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

that was my point as well, max and sum are typical examples. it would
make it more expressive and powerful, as it extends the one or null
model, and possibly more elegant. useful is a different story and i
wouldn't know how to measure at this point (which was the convincing
part of avery's point).

On Tue, Jan 10, 2012 at 12:57 AM, Jakob Homan <jg...@gmail.com> wrote:
>> In my opinion that means reducing to a single message or none at all.
> C&A doesn't require this, however.  Hadoop's combiner interface, for
> instance, doesn't require a single  or no value to be returned; it has
> the same interface as a reducer, zero or more values.  Would adapting
> the semantics of Giraph's combiner to return a list of messages
> (possibly empty) make it more useful?
>
> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
> <cl...@gmail.com> wrote:
>> Yes, what is you say is completely reasonable, you convinced me :)
>>
>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching <ac...@apache.org> wrote:
>>> Combiners should be commutative and associative.  In my opinion that means
>>> reducing to a single message or none at all.  Can you think of a case when
>>> more than 1 message should be returned from a combiner?  I know that
>>> returning null isn't preferable in general, but I think that functionality
>>> (returning no messages), is nice to have and isn't a huge amount of work on
>>> our side.
>>>
>>> Avery
>>>
>>>
>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>
>>>> To clarify, I was not discussing the possibility for combine to return
>>>> null. I see why it would be useful, given that combine returns M,
>>>> there's no other way to let combiner ask not to send any message,
>>>> although i agree with Jakob, I also believe returning null should be
>>>> avoided but only used, roughly, as an init value for a
>>>> reference/pointer.
>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>> but maybe it's too much.
>>>>
>>>> As far as the size of the messages parameter, I agree with you that 0
>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>> was more a matter of synching it with the other methods getting the
>>>> messages parameter.
>>>> Probably, having a more clear javadoc will do the job here.
>>>>
>>>> What do you think?
>>>>
>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>
>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>> calling code (null checks, or not, since people usually will forget
>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>> it conceivable that one would want to write a combiner that returned
>>>>> something for an input of no parameters, ie combining the empty list
>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>> combining a single message would result in that message.
>>>>>
>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>  wrote:
>>>>>>
>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>
>>>>>>  /**
>>>>>>   * Combines message values for a particular vertex index.
>>>>>>   *
>>>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>>>   * @param msgList List of the messages to be combined
>>>>>>   * @return Message that is combined from {@link MsgList} or null if no
>>>>>>   *         message it to be sent
>>>>>>   * @throws IOException
>>>>>>   */
>>>>>>
>>>>>> I think we are somewhat vague on what a combiner can return to support
>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>> compute() algorithm.  I think it should be legal to return null from a
>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>
>>>>>> It seems like it would be an overhead to call a combiner when there are
>>>>>> 0
>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>> should
>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>> message
>>>>>> to have combine() being called.
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>>
>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>
>>>>>>> Hi Sebastian,
>>>>>>>
>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>> define explicitly the semantics of this scenario.
>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>>>>>>> and compute(), and act accordingly.
>>>>>>>
>>>>>>> In the particular example, I believe the SimpleSumCombiner is bugged.
>>>>>>> It's true that the sum of no values is 0, but it's also true that the
>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>> situation.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>> only
>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>> vertex
>>>>>>>> has been sent.
>>>>>>>>
>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>
>>>>>>>> --sebastian
>>>>>>>>
>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>
>>>>>>>>> Hello list,
>>>>>>>>>
>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>
>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>
>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>> vertex
>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>
>>>>>>>>>     public Iterable<M>    getMessages(I vertexId) {
>>>>>>>>>       Iterable<M>    messages = inMessages.getMessages(vertexId);
>>>>>>>>>       if (combiner != null) {
>>>>>>>>>               M combinedMsg;
>>>>>>>>>               try {
>>>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>>>> messages);
>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>                       throw new RuntimeException("could not combine",
>>>>>>>>> e);
>>>>>>>>>               }
>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>                       List<M>    tmp = new ArrayList<M>(1);
>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>                       messages = tmp;
>>>>>>>>>               } else {
>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>               }
>>>>>>>>>       }
>>>>>>>>>       return messages;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns a
>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>> with
>>>>>>>>> 0 messages?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

Ok, i'll open a ticket with this issue. As the discussion about
warning vs exception seams still open, we can move it to JIRA where a
decision can be taken.

On Fri, Jan 13, 2012 at 10:24 PM, Jakob Homan <jg...@gmail.com> wrote:
> I think this is good, but really think a warning is enough, rather
> than an exception.  There's no reason to pre-emptively
>
> On Fri, Jan 13, 2012 at 1:03 PM, Sebastian Schelter <ss...@apache.org> wrote:
>> +1 on Iterable <= messages.size() also from me.
>>
>>
>> On 13.01.2012 19:51, Avery Ching wrote:
>>> +1
>>>
>>> I'm fine with this.  If we agree to return an Iterable, then we should
>>> make sure to either throw if the size of the Iterable > messages.size()
>>> to at the very least LOG.warn("This combiner is likely to be implemented
>>> wrong").  I prefer an exception, since we have no use case for expanding
>>> the set of messages.
>>>
>>> Also, I'd like to have something in the javadoc saying something like
>>> "While the number of messages returned can be equal to the same number
>>> of messages that was inputted, the purpose of the combiner is to reduced
>>> the number of messages from the input."
>>>
>>> Avery
>>>
>>> On 1/13/12 9:34 AM, Claudio Martella wrote:
>>>> Ok,
>>>>
>>>> I guess we can vote then about this, what do you think?
>>>> Shall we take 72h?
>>>>
>>>> I'm +1 for returning an iterable that can be empty.
>>>> I'm +1 for the returned iterable to be<= messages.size()
>>>>
>>>>
>>>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<ss...@apache.org>
>>>> wrote:
>>>>> I think we should make the combiner return a list/iterable that can
>>>>> potentially be empty. However we should assume that the number of
>>>>> elements returned is smaller than or equal to the number of input
>>>>> elements (whats the use of a combiner if this is not given?). I also
>>>>> concur that the code should not depend on the combiner being applied
>>>>> (similar to the way combiners work in hadoop).
>>>>>
>>>>> --sebastian
>>>>>
>>>>> 2012/1/10 Jakob Homan<jg...@gmail.com>:
>>>>>> A composite object would essentially be a wrapper around a list and
>>>>>> introduce the need for all vertices to be ready to extract that list
>>>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>>>> messages.  If we allow zero or one return elements, the combiner would
>>>>>> have to create a composite object with a list of those four messages,
>>>>>> whereas if we return a list, it just skips that step and returns the
>>>>>> four messages.  Additionally, the receiving vertex would have to
>>>>>> handle the possibility of a composite object every time even though
>>>>>> the combiner may or may not have been run during the superstep, or
>>>>>> even included in that job (since combiners are optional to the job
>>>>>> itself).  It would be better if one could write a Giraph application
>>>>>> that was completely agnostic of whether or not a combiner was
>>>>>> included.
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>>>> <cl...@gmail.com>  wrote:
>>>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>>>> stand :) Once you give them any API they have the power to do anything
>>>>>>> wrong, as they already can with Giraph (or anything else for what it
>>>>>>> matters), by designing an algorithm wrongly (which would be what it
>>>>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>>>>> composite object would make the grouping (List<Group>) but I thought
>>>>>>> we were talking about simplifying life to users :). I think it would
>>>>>>> be more flexible (for the present and for the future) and also more
>>>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>>>> for free).
>>>>>>>
>>>>>>> Very cool discussion.
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jg...@gmail.com>
>>>>>>> wrote:
>>>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>>>> so they can't send messages to other vertices.
>>>>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>>>> vertex
>>>>>>>> before processing as well).  When would they be forwarded to their
>>>>>>>> new
>>>>>>>> destinations at that point?  It would be possible to get into a
>>>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>>>> ever
>>>>>>>> actually be done.
>>>>>>>>
>>>>>>>> That being said, our inability to think of a good application doesn't
>>>>>>>> mean there won't be one in the future, and it's probably better to be
>>>>>>>> more flexible than try to impose what appears optimal now.  The
>>>>>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>>>>>> flexibility of allowing another list of messages (which may or may
>>>>>>>> not
>>>>>>>> be the same number of elements as the original, less than, or even
>>>>>>>> more than).
>>>>>>>>
>>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>> Agreed.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<ac...@apache.org>
>>>>>>>> wrote:
>>>>>>>>> The general idea of combiners is to reduce the number of messages
>>>>>>>>> sent.
>>>>>>>>>   Combiners are purely an optimization and the application should
>>>>>>>>> work
>>>>>>>>> correctly without it (since it's never guaranteed to actually be
>>>>>>>>> called).
>>>>>>>>>   Combiners can only modify the messages sent to a single vertex,
>>>>>>>>> so they
>>>>>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>>>>
>>>>>>>>> While I think that grouping behavior could actually be
>>>>>>>>> implemented within a
>>>>>>>>> message object (still reducing the number of messages to 1 or 0)
>>>>>>>>> I suppose
>>>>>>>>> that in some simple cases (i.e. grouping), it might be easier by
>>>>>>>>> doing it in
>>>>>>>>> the combiner as you both have mentioned?  The only thing I
>>>>>>>>> suppose I'm
>>>>>>>>> concerned about is letting users do something that is not optimal.
>>>>>>>>>   Generally, expanding messages is not what you want your
>>>>>>>>> combiner to do.
>>>>>>>>>   Also, since grouping behavior can be implemented in the message
>>>>>>>>> object, it
>>>>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>>>>
>>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>>>
>>>>>>>>> Avery
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>>>>> Ok, now i see where you're going. I guess that the thing here is
>>>>>>>>>> that
>>>>>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>>>>>> concretely it would probably need some local data related to D
>>>>>>>>>> (edges
>>>>>>>>>> values? vertexvalue?).
>>>>>>>>>> I also think that k>    n is also possible in principle and we
>>>>>>>>>> could let
>>>>>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>>>>>> that letting the user send k messages in the combiner is useful
>>>>>>>>>> (and
>>>>>>>>>> the grouping behavior shown by the label propagation example
>>>>>>>>>> should do
>>>>>>>>>> so).
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob
>>>>>>>>>> Homan<jg...@gmail.com>    wrote:
>>>>>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>>>>>> which would have then then been sent to, say, M.  This would
>>>>>>>>>>> save the
>>>>>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not
>>>>>>>>>>> saying
>>>>>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>>>>>> that combiners can generate messages bound for vertices other
>>>>>>>>>>> than the
>>>>>>>>>>> original destination, which I don't know if that has even been
>>>>>>>>>>> discussed.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>>> i'm not sure i understand what you'd save here. if the two
>>>>>>>>>>>> messages
>>>>>>>>>>>> were going to be expanded to k messages on the destination
>>>>>>>>>>>> worker D,
>>>>>>>>>>>> but you expand them on W, you end up sending k messages
>>>>>>>>>>>> instead of 2.
>>>>>>>>>>>> right?
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob
>>>>>>>>>>>> Homan<jg...@gmail.com>    wrote:
>>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>>> returned by
>>>>>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>>>>> Right.  Grouping would be the most common case.  It would be
>>>>>>>>>>>>> possible
>>>>>>>>>>>>> to be great than k, as well.  For instance, consider two
>>>>>>>>>>>>> messages,
>>>>>>>>>>>>> both generated on the same worker (W) by two two different
>>>>>>>>>>>>> vertices,
>>>>>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get
>>>>>>>>>>>>> both of
>>>>>>>>>>>>> these messages, do some work on them, as it would have
>>>>>>>>>>>>> knowledge of
>>>>>>>>>>>>> both, and generate some arbitrary number of messages bound
>>>>>>>>>>>>> for other
>>>>>>>>>>>>> vertices (thus saving the shuffle/transfer of the original
>>>>>>>>>>>>> messages).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>>> returned by
>>>>>>>>>>>>>> the combiner, can still be smaller than n, the size of the
>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>> parameter. as a first example, you can imagine your vertex
>>>>>>>>>>>>>> receiving
>>>>>>>>>>>>>> semantically-different classes/types of messages, and you
>>>>>>>>>>>>>> can imagine
>>>>>>>>>>>>>> willing to be summarizing them in different messages, i.e.
>>>>>>>>>>>>>> if your
>>>>>>>>>>>>>> messages come along with labels or just simply by the source
>>>>>>>>>>>>>> vertex,
>>>>>>>>>>>>>> if required by the algorithm, think of label propagation to
>>>>>>>>>>>>>> have just
>>>>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think
>>>>>>>>>>>>>>> of why I
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>> want to use a combiner to expand the number of messages.
>>>>>>>>>>>>>>> Can you?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>>> In my opinion that means reducing to a single message or
>>>>>>>>>>>>>>>>> none at
>>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner
>>>>>>>>>>>>>>>> interface, for
>>>>>>>>>>>>>>>> instance, doesn't require a single  or no value to be
>>>>>>>>>>>>>>>> returned; it
>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>>>>>> adapting
>>>>>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of
>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>>>>>> <cl...@gmail.com>      wrote:
>>>>>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you
>>>>>>>>>>>>>>>>> convinced me :)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery
>>>>>>>>>>>>>>>>> Ching<ac...@apache.org>
>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my
>>>>>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you
>>>>>>>>>>>>>>>>>> think of a
>>>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>>> more than 1 message should be returned from a combiner?
>>>>>>>>>>>>>>>>>> I know
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> returning null isn't preferable in general, but I think
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a
>>>>>>>>>>>>>>>>>> huge amount
>>>>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for
>>>>>>>>>>>>>>>>>>> combine to
>>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine
>>>>>>>>>>>>>>>>>>> returns M,
>>>>>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send
>>>>>>>>>>>>>>>>>>> any message,
>>>>>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning
>>>>>>>>>>>>>>>>>>> null should
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here,
>>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what
>>>>>>>>>>>>>>>>>>> to combine
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension
>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree
>>>>>>>>>>>>>>>>>>> with you
>>>>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>>>>> was more a matter of synching it with the other methods
>>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job
>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob
>>>>>>>>>>>>>>>>>>> Homan<jg...@gmail.com>
>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra
>>>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>>> calling code (null checks, or not, since people
>>>>>>>>>>>>>>>>>>>> usually will
>>>>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining
>>>>>>>>>>>>>>>>>>>> the empty
>>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most
>>>>>>>>>>>>>>>>>>>> combiners,
>>>>>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery
>>>>>>>>>>>>>>>>>>>> Ching<ac...@apache.org>
>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>>>>    * Combines message values for a particular vertex
>>>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>>>>    * @param vertexIndex Index of the vertex getting
>>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>>    * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>>>>>    * @return Message that is combined from {@link
>>>>>>>>>>>>>>>>>>>>> MsgList} or
>>>>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>>    *         message it to be sent
>>>>>>>>>>>>>>>>>>>>>    * @throws IOException
>>>>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can
>>>>>>>>>>>>>>>>>>>>> return to
>>>>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to
>>>>>>>>>>>>>>>>>>>>> return null
>>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a
>>>>>>>>>>>>>>>>>>>>> combiner when
>>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be
>>>>>>>>>>>>>>>>>>>>> useful.
>>>>>>>>>>>>>>>>>>>>>   Perhaps we
>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must
>>>>>>>>>>>>>>>>>>>>> contain at least
>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was
>>>>>>>>>>>>>>>>>>>>>> whether we
>>>>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready
>>>>>>>>>>>>>>>>>>>>>> to receive
>>>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>>>> messages, as it's the case of
>>>>>>>>>>>>>>>>>>>>>> BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> In the particular example, I believe the
>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner is
>>>>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's
>>>>>>>>>>>>>>>>>>>>>> also true
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable
>>>>>>>>>>>>>>>>>>>>>> for this
>>>>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there
>>>>>>>>>>>>>>>>>>>>>>> is at least
>>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The
>>>>>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message
>>>>>>>>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages
>>>>>>>>>>>>>>>>>>>>>>>> and hit an
>>>>>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the
>>>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even
>>>>>>>>>>>>>>>>>>>>>>>> if there
>>>>>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns
>>>>>>>>>>>>>>>>>>>>>>>> 0) and for
>>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the
>>>>>>>>>>>>>>>>>>>>>>>> combiner for
>>>>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>      public Iterable<M>          getMessages(I
>>>>>>>>>>>>>>>>>>>>>>>> vertexId) {
>>>>>>>>>>>>>>>>>>>>>>>>        Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>>>>        if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>>>>                M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>>>>                try {
>>>>>>>>>>>>>>>>>>>>>>>>                        combinedMsg =
>>>>>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>>>>                        throw new
>>>>>>>>>>>>>>>>>>>>>>>> RuntimeException("could not
>>>>>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>>>>                        List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>>>>                        messages = tmp;
>>>>>>>>>>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>>>>>>>>>>                        messages = new
>>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>>>>        return messages;
>>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner
>>>>>>>>>>>>>>>>>>>>>>>> code that
>>>>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that
>>>>>>>>>>>>>>>>>>>>>>>> calls the
>>>>>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>     Claudio Martella
>>>>>>>     claudio.martella@gmail.com
>>>>
>>>>
>>>
>>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Jakob Homan <jg...@gmail.com>.

I think this is good, but really think a warning is enough, rather
than an exception.  There's no reason to pre-emptively

On Fri, Jan 13, 2012 at 1:03 PM, Sebastian Schelter <ss...@apache.org> wrote:
> +1 on Iterable <= messages.size() also from me.
>
>
> On 13.01.2012 19:51, Avery Ching wrote:
>> +1
>>
>> I'm fine with this.  If we agree to return an Iterable, then we should
>> make sure to either throw if the size of the Iterable > messages.size()
>> to at the very least LOG.warn("This combiner is likely to be implemented
>> wrong").  I prefer an exception, since we have no use case for expanding
>> the set of messages.
>>
>> Also, I'd like to have something in the javadoc saying something like
>> "While the number of messages returned can be equal to the same number
>> of messages that was inputted, the purpose of the combiner is to reduced
>> the number of messages from the input."
>>
>> Avery
>>
>> On 1/13/12 9:34 AM, Claudio Martella wrote:
>>> Ok,
>>>
>>> I guess we can vote then about this, what do you think?
>>> Shall we take 72h?
>>>
>>> I'm +1 for returning an iterable that can be empty.
>>> I'm +1 for the returned iterable to be<= messages.size()
>>>
>>>
>>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<ss...@apache.org>
>>> wrote:
>>>> I think we should make the combiner return a list/iterable that can
>>>> potentially be empty. However we should assume that the number of
>>>> elements returned is smaller than or equal to the number of input
>>>> elements (whats the use of a combiner if this is not given?). I also
>>>> concur that the code should not depend on the combiner being applied
>>>> (similar to the way combiners work in hadoop).
>>>>
>>>> --sebastian
>>>>
>>>> 2012/1/10 Jakob Homan<jg...@gmail.com>:
>>>>> A composite object would essentially be a wrapper around a list and
>>>>> introduce the need for all vertices to be ready to extract that list
>>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>>> messages.  If we allow zero or one return elements, the combiner would
>>>>> have to create a composite object with a list of those four messages,
>>>>> whereas if we return a list, it just skips that step and returns the
>>>>> four messages.  Additionally, the receiving vertex would have to
>>>>> handle the possibility of a composite object every time even though
>>>>> the combiner may or may not have been run during the superstep, or
>>>>> even included in that job (since combiners are optional to the job
>>>>> itself).  It would be better if one could write a Giraph application
>>>>> that was completely agnostic of whether or not a combiner was
>>>>> included.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>>> <cl...@gmail.com>  wrote:
>>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>>> stand :) Once you give them any API they have the power to do anything
>>>>>> wrong, as they already can with Giraph (or anything else for what it
>>>>>> matters), by designing an algorithm wrongly (which would be what it
>>>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>>>> composite object would make the grouping (List<Group>) but I thought
>>>>>> we were talking about simplifying life to users :). I think it would
>>>>>> be more flexible (for the present and for the future) and also more
>>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>>> for free).
>>>>>>
>>>>>> Very cool discussion.
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jg...@gmail.com>
>>>>>> wrote:
>>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>>> so they can't send messages to other vertices.
>>>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>>> vertex
>>>>>>> before processing as well).  When would they be forwarded to their
>>>>>>> new
>>>>>>> destinations at that point?  It would be possible to get into a
>>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>>> ever
>>>>>>> actually be done.
>>>>>>>
>>>>>>> That being said, our inability to think of a good application doesn't
>>>>>>> mean there won't be one in the future, and it's probably better to be
>>>>>>> more flexible than try to impose what appears optimal now.  The
>>>>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>>>>> flexibility of allowing another list of messages (which may or may
>>>>>>> not
>>>>>>> be the same number of elements as the original, less than, or even
>>>>>>> more than).
>>>>>>>
>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>> Agreed.
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<ac...@apache.org>
>>>>>>> wrote:
>>>>>>>> The general idea of combiners is to reduce the number of messages
>>>>>>>> sent.
>>>>>>>>   Combiners are purely an optimization and the application should
>>>>>>>> work
>>>>>>>> correctly without it (since it's never guaranteed to actually be
>>>>>>>> called).
>>>>>>>>   Combiners can only modify the messages sent to a single vertex,
>>>>>>>> so they
>>>>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>>>
>>>>>>>> While I think that grouping behavior could actually be
>>>>>>>> implemented within a
>>>>>>>> message object (still reducing the number of messages to 1 or 0)
>>>>>>>> I suppose
>>>>>>>> that in some simple cases (i.e. grouping), it might be easier by
>>>>>>>> doing it in
>>>>>>>> the combiner as you both have mentioned?  The only thing I
>>>>>>>> suppose I'm
>>>>>>>> concerned about is letting users do something that is not optimal.
>>>>>>>>   Generally, expanding messages is not what you want your
>>>>>>>> combiner to do.
>>>>>>>>   Also, since grouping behavior can be implemented in the message
>>>>>>>> object, it
>>>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>>>
>>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>>>> Ok, now i see where you're going. I guess that the thing here is
>>>>>>>>> that
>>>>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>>>>> concretely it would probably need some local data related to D
>>>>>>>>> (edges
>>>>>>>>> values? vertexvalue?).
>>>>>>>>> I also think that k>    n is also possible in principle and we
>>>>>>>>> could let
>>>>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>>>>> that letting the user send k messages in the combiner is useful
>>>>>>>>> (and
>>>>>>>>> the grouping behavior shown by the label propagation example
>>>>>>>>> should do
>>>>>>>>> so).
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob
>>>>>>>>> Homan<jg...@gmail.com>    wrote:
>>>>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>>>>> which would have then then been sent to, say, M.  This would
>>>>>>>>>> save the
>>>>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not
>>>>>>>>>> saying
>>>>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>>>>> that combiners can generate messages bound for vertices other
>>>>>>>>>> than the
>>>>>>>>>> original destination, which I don't know if that has even been
>>>>>>>>>> discussed.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>> i'm not sure i understand what you'd save here. if the two
>>>>>>>>>>> messages
>>>>>>>>>>> were going to be expanded to k messages on the destination
>>>>>>>>>>> worker D,
>>>>>>>>>>> but you expand them on W, you end up sending k messages
>>>>>>>>>>> instead of 2.
>>>>>>>>>>> right?
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob
>>>>>>>>>>> Homan<jg...@gmail.com>    wrote:
>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>> returned by
>>>>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>>>> Right.  Grouping would be the most common case.  It would be
>>>>>>>>>>>> possible
>>>>>>>>>>>> to be great than k, as well.  For instance, consider two
>>>>>>>>>>>> messages,
>>>>>>>>>>>> both generated on the same worker (W) by two two different
>>>>>>>>>>>> vertices,
>>>>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get
>>>>>>>>>>>> both of
>>>>>>>>>>>> these messages, do some work on them, as it would have
>>>>>>>>>>>> knowledge of
>>>>>>>>>>>> both, and generate some arbitrary number of messages bound
>>>>>>>>>>>> for other
>>>>>>>>>>>> vertices (thus saving the shuffle/transfer of the original
>>>>>>>>>>>> messages).
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>>> returned by
>>>>>>>>>>>>> the combiner, can still be smaller than n, the size of the
>>>>>>>>>>>>> messages
>>>>>>>>>>>>> parameter. as a first example, you can imagine your vertex
>>>>>>>>>>>>> receiving
>>>>>>>>>>>>> semantically-different classes/types of messages, and you
>>>>>>>>>>>>> can imagine
>>>>>>>>>>>>> willing to be summarizing them in different messages, i.e.
>>>>>>>>>>>>> if your
>>>>>>>>>>>>> messages come along with labels or just simply by the source
>>>>>>>>>>>>> vertex,
>>>>>>>>>>>>> if required by the algorithm, think of label propagation to
>>>>>>>>>>>>> have just
>>>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think
>>>>>>>>>>>>>> of why I
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>> want to use a combiner to expand the number of messages.
>>>>>>>>>>>>>> Can you?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>>> In my opinion that means reducing to a single message or
>>>>>>>>>>>>>>>> none at
>>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner
>>>>>>>>>>>>>>> interface, for
>>>>>>>>>>>>>>> instance, doesn't require a single  or no value to be
>>>>>>>>>>>>>>> returned; it
>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>>>>> adapting
>>>>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of
>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>>>>> <cl...@gmail.com>      wrote:
>>>>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you
>>>>>>>>>>>>>>>> convinced me :)
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery
>>>>>>>>>>>>>>>> Ching<ac...@apache.org>
>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my
>>>>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you
>>>>>>>>>>>>>>>>> think of a
>>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>>> more than 1 message should be returned from a combiner?
>>>>>>>>>>>>>>>>> I know
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> returning null isn't preferable in general, but I think
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a
>>>>>>>>>>>>>>>>> huge amount
>>>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for
>>>>>>>>>>>>>>>>>> combine to
>>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine
>>>>>>>>>>>>>>>>>> returns M,
>>>>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send
>>>>>>>>>>>>>>>>>> any message,
>>>>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning
>>>>>>>>>>>>>>>>>> null should
>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here,
>>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what
>>>>>>>>>>>>>>>>>> to combine
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension
>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree
>>>>>>>>>>>>>>>>>> with you
>>>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>>>> was more a matter of synching it with the other methods
>>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job
>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob
>>>>>>>>>>>>>>>>>> Homan<jg...@gmail.com>
>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra
>>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>>> calling code (null checks, or not, since people
>>>>>>>>>>>>>>>>>>> usually will
>>>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining
>>>>>>>>>>>>>>>>>>> the empty
>>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most
>>>>>>>>>>>>>>>>>>> combiners,
>>>>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery
>>>>>>>>>>>>>>>>>>> Ching<ac...@apache.org>
>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>>>    * Combines message values for a particular vertex
>>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>>>    * @param vertexIndex Index of the vertex getting
>>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>>    * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>>>>    * @return Message that is combined from {@link
>>>>>>>>>>>>>>>>>>>> MsgList} or
>>>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>>    *         message it to be sent
>>>>>>>>>>>>>>>>>>>>    * @throws IOException
>>>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can
>>>>>>>>>>>>>>>>>>>> return to
>>>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to
>>>>>>>>>>>>>>>>>>>> return null
>>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a
>>>>>>>>>>>>>>>>>>>> combiner when
>>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be
>>>>>>>>>>>>>>>>>>>> useful.
>>>>>>>>>>>>>>>>>>>>   Perhaps we
>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must
>>>>>>>>>>>>>>>>>>>> contain at least
>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was
>>>>>>>>>>>>>>>>>>>>> whether we
>>>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready
>>>>>>>>>>>>>>>>>>>>> to receive
>>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>>> messages, as it's the case of
>>>>>>>>>>>>>>>>>>>>> BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> In the particular example, I believe the
>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner is
>>>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's
>>>>>>>>>>>>>>>>>>>>> also true
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable
>>>>>>>>>>>>>>>>>>>>> for this
>>>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there
>>>>>>>>>>>>>>>>>>>>>> is at least
>>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The
>>>>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message
>>>>>>>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages
>>>>>>>>>>>>>>>>>>>>>>> and hit an
>>>>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the
>>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even
>>>>>>>>>>>>>>>>>>>>>>> if there
>>>>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns
>>>>>>>>>>>>>>>>>>>>>>> 0) and for
>>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the
>>>>>>>>>>>>>>>>>>>>>>> combiner for
>>>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>      public Iterable<M>          getMessages(I
>>>>>>>>>>>>>>>>>>>>>>> vertexId) {
>>>>>>>>>>>>>>>>>>>>>>>        Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>>>        if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>>>                M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>>>                try {
>>>>>>>>>>>>>>>>>>>>>>>                        combinedMsg =
>>>>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>>>                        throw new
>>>>>>>>>>>>>>>>>>>>>>> RuntimeException("could not
>>>>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>>>                        List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>>>                        messages = tmp;
>>>>>>>>>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>>>>>>>>>                        messages = new
>>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>>>        return messages;
>>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner
>>>>>>>>>>>>>>>>>>>>>>> code that
>>>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that
>>>>>>>>>>>>>>>>>>>>>>> calls the
>>>>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>     Claudio Martella
>>>>>>     claudio.martella@gmail.com
>>>
>>>
>>
>

Re: on the semantics of the combiner

Posted by Sebastian Schelter <ss...@apache.org>.

+1 on Iterable <= messages.size() also from me.


On 13.01.2012 19:51, Avery Ching wrote:
> +1
> 
> I'm fine with this.  If we agree to return an Iterable, then we should
> make sure to either throw if the size of the Iterable > messages.size()
> to at the very least LOG.warn("This combiner is likely to be implemented
> wrong").  I prefer an exception, since we have no use case for expanding
> the set of messages.
> 
> Also, I'd like to have something in the javadoc saying something like
> "While the number of messages returned can be equal to the same number
> of messages that was inputted, the purpose of the combiner is to reduced
> the number of messages from the input."
> 
> Avery
> 
> On 1/13/12 9:34 AM, Claudio Martella wrote:
>> Ok,
>>
>> I guess we can vote then about this, what do you think?
>> Shall we take 72h?
>>
>> I'm +1 for returning an iterable that can be empty.
>> I'm +1 for the returned iterable to be<= messages.size()
>>
>>
>> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<ss...@apache.org> 
>> wrote:
>>> I think we should make the combiner return a list/iterable that can
>>> potentially be empty. However we should assume that the number of
>>> elements returned is smaller than or equal to the number of input
>>> elements (whats the use of a combiner if this is not given?). I also
>>> concur that the code should not depend on the combiner being applied
>>> (similar to the way combiners work in hadoop).
>>>
>>> --sebastian
>>>
>>> 2012/1/10 Jakob Homan<jg...@gmail.com>:
>>>> A composite object would essentially be a wrapper around a list and
>>>> introduce the need for all vertices to be ready to extract that list
>>>> at all times.  For instance, a combiner passed 10 messages may be able
>>>> to combine 7 of them but do nothing with the other three, leaving four
>>>> messages.  If we allow zero or one return elements, the combiner would
>>>> have to create a composite object with a list of those four messages,
>>>> whereas if we return a list, it just skips that step and returns the
>>>> four messages.  Additionally, the receiving vertex would have to
>>>> handle the possibility of a composite object every time even though
>>>> the combiner may or may not have been run during the superstep, or
>>>> even included in that job (since combiners are optional to the job
>>>> itself).  It would be better if one could write a Giraph application
>>>> that was completely agnostic of whether or not a combiner was
>>>> included.
>>>>
>>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>>> <cl...@gmail.com>  wrote:
>>>>> I believe the argument of not letting users shoot their foot doesn't
>>>>> stand :) Once you give them any API they have the power to do anything
>>>>> wrong, as they already can with Giraph (or anything else for what it
>>>>> matters), by designing an algorithm wrongly (which would be what it
>>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>>> composite object would make the grouping (List<Group>) but I thought
>>>>> we were talking about simplifying life to users :). I think it would
>>>>> be more flexible (for the present and for the future) and also more
>>>>> elegant,  but not necessarily a must (although it'd come practically
>>>>> for free).
>>>>>
>>>>> Very cool discussion.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jg...@gmail.com> 
>>>>> wrote:
>>>>>>> Combiners can only modify the messages sent to a single vertex,
>>>>>>> so they can't send messages to other vertices.
>>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>>> be.  These new messages may be generated upon arrival at the
>>>>>> destination vertex (since combiners can be run on the receiving
>>>>>> vertex
>>>>>> before processing as well).  When would they be forwarded to their
>>>>>> new
>>>>>> destinations at that point?  It would be possible to get into a
>>>>>> feedback loop of messages jumping around before a superstep could
>>>>>> ever
>>>>>> actually be done.
>>>>>>
>>>>>> That being said, our inability to think of a good application doesn't
>>>>>> mean there won't be one in the future, and it's probably better to be
>>>>>> more flexible than try to impose what appears optimal now.  The
>>>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>>>> flexibility of allowing another list of messages (which may or may
>>>>>> not
>>>>>> be the same number of elements as the original, less than, or even
>>>>>> more than).
>>>>>>
>>>>>>> Good discussion (it's making me really think about this)!
>>>>>> Agreed.
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<ac...@apache.org> 
>>>>>> wrote:
>>>>>>> The general idea of combiners is to reduce the number of messages
>>>>>>> sent.
>>>>>>>   Combiners are purely an optimization and the application should
>>>>>>> work
>>>>>>> correctly without it (since it's never guaranteed to actually be
>>>>>>> called).
>>>>>>>   Combiners can only modify the messages sent to a single vertex,
>>>>>>> so they
>>>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>>
>>>>>>> While I think that grouping behavior could actually be
>>>>>>> implemented within a
>>>>>>> message object (still reducing the number of messages to 1 or 0)
>>>>>>> I suppose
>>>>>>> that in some simple cases (i.e. grouping), it might be easier by
>>>>>>> doing it in
>>>>>>> the combiner as you both have mentioned?  The only thing I
>>>>>>> suppose I'm
>>>>>>> concerned about is letting users do something that is not optimal.
>>>>>>>   Generally, expanding messages is not what you want your
>>>>>>> combiner to do.
>>>>>>>   Also, since grouping behavior can be implemented in the message
>>>>>>> object, it
>>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>>
>>>>>>> Good discussion (it's making me really think about this)!
>>>>>>>
>>>>>>> Avery
>>>>>>>
>>>>>>>
>>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>>> Ok, now i see where you're going. I guess that the thing here is
>>>>>>>> that
>>>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>>>> concretely it would probably need some local data related to D
>>>>>>>> (edges
>>>>>>>> values? vertexvalue?).
>>>>>>>> I also think that k>    n is also possible in principle and we
>>>>>>>> could let
>>>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>>>> that letting the user send k messages in the combiner is useful
>>>>>>>> (and
>>>>>>>> the grouping behavior shown by the label propagation example
>>>>>>>> should do
>>>>>>>> so).
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob
>>>>>>>> Homan<jg...@gmail.com>    wrote:
>>>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>>>> which would have then then been sent to, say, M.  This would
>>>>>>>>> save the
>>>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not
>>>>>>>>> saying
>>>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>>>> that combiners can generate messages bound for vertices other
>>>>>>>>> than the
>>>>>>>>> original destination, which I don't know if that has even been
>>>>>>>>> discussed.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>> i'm not sure i understand what you'd save here. if the two
>>>>>>>>>> messages
>>>>>>>>>> were going to be expanded to k messages on the destination
>>>>>>>>>> worker D,
>>>>>>>>>> but you expand them on W, you end up sending k messages
>>>>>>>>>> instead of 2.
>>>>>>>>>> right?
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob
>>>>>>>>>> Homan<jg...@gmail.com>    wrote:
>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>> returned by
>>>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>>> Right.  Grouping would be the most common case.  It would be
>>>>>>>>>>> possible
>>>>>>>>>>> to be great than k, as well.  For instance, consider two
>>>>>>>>>>> messages,
>>>>>>>>>>> both generated on the same worker (W) by two two different
>>>>>>>>>>> vertices,
>>>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get
>>>>>>>>>>> both of
>>>>>>>>>>> these messages, do some work on them, as it would have
>>>>>>>>>>> knowledge of
>>>>>>>>>>> both, and generate some arbitrary number of messages bound
>>>>>>>>>>> for other
>>>>>>>>>>> vertices (thus saving the shuffle/transfer of the original
>>>>>>>>>>> messages).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>>> it doesn't have to be expand, k, the number of elements
>>>>>>>>>>>> returned by
>>>>>>>>>>>> the combiner, can still be smaller than n, the size of the
>>>>>>>>>>>> messages
>>>>>>>>>>>> parameter. as a first example, you can imagine your vertex
>>>>>>>>>>>> receiving
>>>>>>>>>>>> semantically-different classes/types of messages, and you
>>>>>>>>>>>> can imagine
>>>>>>>>>>>> willing to be summarizing them in different messages, i.e.
>>>>>>>>>>>> if your
>>>>>>>>>>>> messages come along with labels or just simply by the source
>>>>>>>>>>>> vertex,
>>>>>>>>>>>> if required by the algorithm, think of label propagation to
>>>>>>>>>>>> have just
>>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think
>>>>>>>>>>>>> of why I
>>>>>>>>>>>>> would
>>>>>>>>>>>>> want to use a combiner to expand the number of messages. 
>>>>>>>>>>>>> Can you?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>>> In my opinion that means reducing to a single message or
>>>>>>>>>>>>>>> none at
>>>>>>>>>>>>>>> all.
>>>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner
>>>>>>>>>>>>>> interface, for
>>>>>>>>>>>>>> instance, doesn't require a single  or no value to be
>>>>>>>>>>>>>> returned; it
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>>>> adapting
>>>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of
>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>>>> <cl...@gmail.com>      wrote:
>>>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you
>>>>>>>>>>>>>>> convinced me :)
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery
>>>>>>>>>>>>>>> Ching<ac...@apache.org>
>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my
>>>>>>>>>>>>>>>> opinion
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you
>>>>>>>>>>>>>>>> think of a
>>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>>> more than 1 message should be returned from a combiner? 
>>>>>>>>>>>>>>>> I know
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> returning null isn't preferable in general, but I think
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a
>>>>>>>>>>>>>>>> huge amount
>>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for
>>>>>>>>>>>>>>>>> combine to
>>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine
>>>>>>>>>>>>>>>>> returns M,
>>>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send
>>>>>>>>>>>>>>>>> any message,
>>>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning
>>>>>>>>>>>>>>>>> null should
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here,
>>>>>>>>>>>>>>>>> let
>>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what
>>>>>>>>>>>>>>>>> to combine
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension
>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree
>>>>>>>>>>>>>>>>> with you
>>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>>> was more a matter of synching it with the other methods
>>>>>>>>>>>>>>>>> getting
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job
>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob
>>>>>>>>>>>>>>>>> Homan<jg...@gmail.com>
>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra
>>>>>>>>>>>>>>>>>> complexity
>>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>>> calling code (null checks, or not, since people
>>>>>>>>>>>>>>>>>> usually will
>>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining
>>>>>>>>>>>>>>>>>> the empty
>>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most
>>>>>>>>>>>>>>>>>> combiners,
>>>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery
>>>>>>>>>>>>>>>>>> Ching<ac...@apache.org>
>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>>    * Combines message values for a particular vertex
>>>>>>>>>>>>>>>>>>> index.
>>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>>    * @param vertexIndex Index of the vertex getting
>>>>>>>>>>>>>>>>>>> these
>>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>>    * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>>>    * @return Message that is combined from {@link
>>>>>>>>>>>>>>>>>>> MsgList} or
>>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>>    *         message it to be sent
>>>>>>>>>>>>>>>>>>>    * @throws IOException
>>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can
>>>>>>>>>>>>>>>>>>> return to
>>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to
>>>>>>>>>>>>>>>>>>> return null
>>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a
>>>>>>>>>>>>>>>>>>> combiner when
>>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be
>>>>>>>>>>>>>>>>>>> useful.
>>>>>>>>>>>>>>>>>>>   Perhaps we
>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must
>>>>>>>>>>>>>>>>>>> contain at least
>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was
>>>>>>>>>>>>>>>>>>>> whether we
>>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready
>>>>>>>>>>>>>>>>>>>> to receive
>>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>>> messages, as it's the case of
>>>>>>>>>>>>>>>>>>>> BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> In the particular example, I believe the
>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner is
>>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's
>>>>>>>>>>>>>>>>>>>> also true
>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable
>>>>>>>>>>>>>>>>>>>> for this
>>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there
>>>>>>>>>>>>>>>>>>>>> is at least
>>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The
>>>>>>>>>>>>>>>>>>>>> messaging
>>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message
>>>>>>>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages
>>>>>>>>>>>>>>>>>>>>>> and hit an
>>>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the
>>>>>>>>>>>>>>>>>>>>>> following
>>>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even
>>>>>>>>>>>>>>>>>>>>>> if there
>>>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns
>>>>>>>>>>>>>>>>>>>>>> 0) and for
>>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the
>>>>>>>>>>>>>>>>>>>>>> combiner for
>>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>      public Iterable<M>          getMessages(I
>>>>>>>>>>>>>>>>>>>>>> vertexId) {
>>>>>>>>>>>>>>>>>>>>>>        Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>>        if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>>                M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>>                try {
>>>>>>>>>>>>>>>>>>>>>>                        combinedMsg =
>>>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>>                        throw new
>>>>>>>>>>>>>>>>>>>>>> RuntimeException("could not
>>>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>>                        List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>>                        messages = tmp;
>>>>>>>>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>>>>>>>>                        messages = new
>>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>>        return messages;
>>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner
>>>>>>>>>>>>>>>>>>>>>> code that
>>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that
>>>>>>>>>>>>>>>>>>>>>> calls the
>>>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- 
>>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -- 
>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -- 
>>>>>>>>>>     Claudio Martella
>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>
>>>>>
>>>>> -- 
>>>>>     Claudio Martella
>>>>>     claudio.martella@gmail.com
>>
>>
>

Re: on the semantics of the combiner

Posted by Avery Ching <ac...@apache.org>.

+1

I'm fine with this.  If we agree to return an Iterable, then we should 
make sure to either throw if the size of the Iterable > messages.size() 
to at the very least LOG.warn("This combiner is likely to be implemented 
wrong").  I prefer an exception, since we have no use case for expanding 
the set of messages.

Also, I'd like to have something in the javadoc saying something like 
"While the number of messages returned can be equal to the same number 
of messages that was inputted, the purpose of the combiner is to reduced 
the number of messages from the input."

Avery

On 1/13/12 9:34 AM, Claudio Martella wrote:
> Ok,
>
> I guess we can vote then about this, what do you think?
> Shall we take 72h?
>
> I'm +1 for returning an iterable that can be empty.
> I'm +1 for the returned iterable to be<= messages.size()
>
>
> On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter<ss...@apache.org>  wrote:
>> I think we should make the combiner return a list/iterable that can
>> potentially be empty. However we should assume that the number of
>> elements returned is smaller than or equal to the number of input
>> elements (whats the use of a combiner if this is not given?). I also
>> concur that the code should not depend on the combiner being applied
>> (similar to the way combiners work in hadoop).
>>
>> --sebastian
>>
>> 2012/1/10 Jakob Homan<jg...@gmail.com>:
>>> A composite object would essentially be a wrapper around a list and
>>> introduce the need for all vertices to be ready to extract that list
>>> at all times.  For instance, a combiner passed 10 messages may be able
>>> to combine 7 of them but do nothing with the other three, leaving four
>>> messages.  If we allow zero or one return elements, the combiner would
>>> have to create a composite object with a list of those four messages,
>>> whereas if we return a list, it just skips that step and returns the
>>> four messages.  Additionally, the receiving vertex would have to
>>> handle the possibility of a composite object every time even though
>>> the combiner may or may not have been run during the superstep, or
>>> even included in that job (since combiners are optional to the job
>>> itself).  It would be better if one could write a Giraph application
>>> that was completely agnostic of whether or not a combiner was
>>> included.
>>>
>>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>>> <cl...@gmail.com>  wrote:
>>>> I believe the argument of not letting users shoot their foot doesn't
>>>> stand :) Once you give them any API they have the power to do anything
>>>> wrong, as they already can with Giraph (or anything else for what it
>>>> matters), by designing an algorithm wrongly (which would be what it
>>>> would turn out to be a wrong combiner). It's definitely true that a
>>>> composite object would make the grouping (List<Group>) but I thought
>>>> we were talking about simplifying life to users :). I think it would
>>>> be more flexible (for the present and for the future) and also more
>>>> elegant,  but not necessarily a must (although it'd come practically
>>>> for free).
>>>>
>>>> Very cool discussion.
>>>>
>>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>> Combiners can only modify the messages sent to a single vertex, so they can't send messages to other vertices.
>>>>> Yeah, the more I've thought about this, the more problematic it would
>>>>> be.  These new messages may be generated upon arrival at the
>>>>> destination vertex (since combiners can be run on the receiving vertex
>>>>> before processing as well).  When would they be forwarded to their new
>>>>> destinations at that point?  It would be possible to get into a
>>>>> feedback loop of messages jumping around before a superstep could ever
>>>>> actually be done.
>>>>>
>>>>> That being said, our inability to think of a good application doesn't
>>>>> mean there won't be one in the future, and it's probably better to be
>>>>> more flexible than try to impose what appears optimal now.  The
>>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>>> flexibility of allowing another list of messages (which may or may not
>>>>> be the same number of elements as the original, less than, or even
>>>>> more than).
>>>>>
>>>>>> Good discussion (it's making me really think about this)!
>>>>> Agreed.
>>>>>
>>>>>
>>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching<ac...@apache.org>  wrote:
>>>>>> The general idea of combiners is to reduce the number of messages sent.
>>>>>>   Combiners are purely an optimization and the application should work
>>>>>> correctly without it (since it's never guaranteed to actually be called).
>>>>>>   Combiners can only modify the messages sent to a single vertex, so they
>>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>>> messages) should be done by the vertex in the compute() method.
>>>>>>
>>>>>> While I think that grouping behavior could actually be implemented within a
>>>>>> message object (still reducing the number of messages to 1 or 0) I suppose
>>>>>> that in some simple cases (i.e. grouping), it might be easier by doing it in
>>>>>> the combiner as you both have mentioned?  The only thing I suppose I'm
>>>>>> concerned about is letting users do something that is not optimal.
>>>>>>   Generally, expanding messages is not what you want your combiner to do.
>>>>>>   Also, since grouping behavior can be implemented in the message object, it
>>>>>> forces users to avoid shooting themselves in the foot.
>>>>>>
>>>>>> Good discussion (it's making me really think about this)!
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>>
>>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>> Ok, now i see where you're going. I guess that the thing here is that
>>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>>> concretely it would probably need some local data related to D (edges
>>>>>>> values? vertexvalue?).
>>>>>>> I also think that k>    n is also possible in principle and we could let
>>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>>> that letting the user send k messages in the combiner is useful (and
>>>>>>> the grouping behavior shown by the label propagation example should do
>>>>>>> so).
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jg...@gmail.com>    wrote:
>>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>>> which would have then then been sent to, say, M.  This would save the
>>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>>> that combiners can generate messages bound for vertices other than the
>>>>>>>> original destination, which I don't know if that has even been
>>>>>>>> discussed.
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>> i'm not sure i understand what you'd save here. if the two messages
>>>>>>>>> were going to be expanded to k messages on the destination worker D,
>>>>>>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>>>>>>> right?
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jg...@gmail.com>    wrote:
>>>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>>>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>>>>>>> both generated on the same worker (W) by two two different vertices,
>>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>>>>>>>> these messages, do some work on them, as it would have knowledge of
>>>>>>>>>> both, and generate some arbitrary number of messages bound for other
>>>>>>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>>>>>>> semantically-different classes/types of messages, and you can imagine
>>>>>>>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>>>>>>>> messages come along with labels or just simply by the source vertex,
>>>>>>>>>>> if required by the algorithm, think of label propagation to have just
>>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think of why I
>>>>>>>>>>>> would
>>>>>>>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>>>>>>>
>>>>>>>>>>>> Avery
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>>>>>>> all.
>>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>>>>>>> has
>>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>>> adapting
>>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>>> <cl...@gmail.com>      wrote:
>>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> means
>>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>>>>>>> case
>>>>>>>>>>>>>>> when
>>>>>>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning null should
>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine
>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>>> was more a matter of synching it with the other methods getting
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
>>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>>> calling code (null checks, or not, since people usually will
>>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner that
>>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining the empty
>>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>   /**
>>>>>>>>>>>>>>>>>>    * Combines message values for a particular vertex index.
>>>>>>>>>>>>>>>>>>    *
>>>>>>>>>>>>>>>>>>    * @param vertexIndex Index of the vertex getting these
>>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>>    * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>>    * @return Message that is combined from {@link MsgList} or
>>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>>    *         message it to be sent
>>>>>>>>>>>>>>>>>>    * @throws IOException
>>>>>>>>>>>>>>>>>>    */
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null
>>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to that
>>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when
>>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be useful.
>>>>>>>>>>>>>>>>>>   Perhaps we
>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must contain at least
>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we
>>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive
>>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable for this
>>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there is at least
>>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging
>>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message for the
>>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following
>>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there
>>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for
>>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for
>>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>      public Iterable<M>          getMessages(I vertexId) {
>>>>>>>>>>>>>>>>>>>>>        Iterable<M>          messages =
>>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>>        if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>>                M combinedMsg;
>>>>>>>>>>>>>>>>>>>>>                try {
>>>>>>>>>>>>>>>>>>>>>                        combinedMsg =
>>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>>                        throw new RuntimeException("could not
>>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>>                        List<M>          tmp = new
>>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>>                        messages = tmp;
>>>>>>>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>>>>>>>                        messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>        return messages;
>>>>>>>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that
>>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the
>>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>     Claudio Martella
>>>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>     Claudio Martella
>>>>>>>>>     claudio.martella@gmail.com
>>>>>>>
>>>>>>>
>>>>
>>>>
>>>> --
>>>>     Claudio Martella
>>>>     claudio.martella@gmail.com
>
>

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

Ok,

I guess we can vote then about this, what do you think?
Shall we take 72h?

I'm +1 for returning an iterable that can be empty.
I'm +1 for the returned iterable to be <= messages.size()


On Tue, Jan 10, 2012 at 9:48 PM, Sebastian Schelter <ss...@apache.org> wrote:
> I think we should make the combiner return a list/iterable that can
> potentially be empty. However we should assume that the number of
> elements returned is smaller than or equal to the number of input
> elements (whats the use of a combiner if this is not given?). I also
> concur that the code should not depend on the combiner being applied
> (similar to the way combiners work in hadoop).
>
> --sebastian
>
> 2012/1/10 Jakob Homan <jg...@gmail.com>:
>> A composite object would essentially be a wrapper around a list and
>> introduce the need for all vertices to be ready to extract that list
>> at all times.  For instance, a combiner passed 10 messages may be able
>> to combine 7 of them but do nothing with the other three, leaving four
>> messages.  If we allow zero or one return elements, the combiner would
>> have to create a composite object with a list of those four messages,
>> whereas if we return a list, it just skips that step and returns the
>> four messages.  Additionally, the receiving vertex would have to
>> handle the possibility of a composite object every time even though
>> the combiner may or may not have been run during the superstep, or
>> even included in that job (since combiners are optional to the job
>> itself).  It would be better if one could write a Giraph application
>> that was completely agnostic of whether or not a combiner was
>> included.
>>
>> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
>> <cl...@gmail.com> wrote:
>>> I believe the argument of not letting users shoot their foot doesn't
>>> stand :) Once you give them any API they have the power to do anything
>>> wrong, as they already can with Giraph (or anything else for what it
>>> matters), by designing an algorithm wrongly (which would be what it
>>> would turn out to be a wrong combiner). It's definitely true that a
>>> composite object would make the grouping (List<Group>) but I thought
>>> we were talking about simplifying life to users :). I think it would
>>> be more flexible (for the present and for the future) and also more
>>> elegant,  but not necessarily a must (although it'd come practically
>>> for free).
>>>
>>> Very cool discussion.
>>>
>>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan <jg...@gmail.com> wrote:
>>>>> Combiners can only modify the messages sent to a single vertex, so they can't send messages to other vertices.
>>>> Yeah, the more I've thought about this, the more problematic it would
>>>> be.  These new messages may be generated upon arrival at the
>>>> destination vertex (since combiners can be run on the receiving vertex
>>>> before processing as well).  When would they be forwarded to their new
>>>> destinations at that point?  It would be possible to get into a
>>>> feedback loop of messages jumping around before a superstep could ever
>>>> actually be done.
>>>>
>>>> That being said, our inability to think of a good application doesn't
>>>> mean there won't be one in the future, and it's probably better to be
>>>> more flexible than try to impose what appears optimal now.  The
>>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>>> flexibility of allowing another list of messages (which may or may not
>>>> be the same number of elements as the original, less than, or even
>>>> more than).
>>>>
>>>>>Good discussion (it's making me really think about this)!
>>>> Agreed.
>>>>
>>>>
>>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching <ac...@apache.org> wrote:
>>>>> The general idea of combiners is to reduce the number of messages sent.
>>>>>  Combiners are purely an optimization and the application should work
>>>>> correctly without it (since it's never guaranteed to actually be called).
>>>>>  Combiners can only modify the messages sent to a single vertex, so they
>>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>>> messages) should be done by the vertex in the compute() method.
>>>>>
>>>>> While I think that grouping behavior could actually be implemented within a
>>>>> message object (still reducing the number of messages to 1 or 0) I suppose
>>>>> that in some simple cases (i.e. grouping), it might be easier by doing it in
>>>>> the combiner as you both have mentioned?  The only thing I suppose I'm
>>>>> concerned about is letting users do something that is not optimal.
>>>>>  Generally, expanding messages is not what you want your combiner to do.
>>>>>  Also, since grouping behavior can be implemented in the message object, it
>>>>> forces users to avoid shooting themselves in the foot.
>>>>>
>>>>> Good discussion (it's making me really think about this)!
>>>>>
>>>>> Avery
>>>>>
>>>>>
>>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>>
>>>>>> Ok, now i see where you're going. I guess that the thing here is that
>>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>>> concretely it would probably need some local data related to D (edges
>>>>>> values? vertexvalue?).
>>>>>> I also think that k>  n is also possible in principle and we could let
>>>>>> the user decide whether to use this power or not, once/if we agree
>>>>>> that letting the user send k messages in the combiner is useful (and
>>>>>> the grouping behavior shown by the label propagation example should do
>>>>>> so).
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>>>
>>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>>> which would have then then been sent to, say, M.  This would save the
>>>>>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>>> that combiners can generate messages bound for vertices other than the
>>>>>>> original destination, which I don't know if that has even been
>>>>>>> discussed.
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>>> <cl...@gmail.com>  wrote:
>>>>>>>>
>>>>>>>> i'm not sure i understand what you'd save here. if the two messages
>>>>>>>> were going to be expanded to k messages on the destination worker D,
>>>>>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>>>>>> right?
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>>
>>>>>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>>>>>> both generated on the same worker (W) by two two different vertices,
>>>>>>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>>>>>>> these messages, do some work on them, as it would have knowledge of
>>>>>>>>> both, and generate some arbitrary number of messages bound for other
>>>>>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>>> <cl...@gmail.com>  wrote:
>>>>>>>>>>
>>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>>>>>> semantically-different classes/types of messages, and you can imagine
>>>>>>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>>>>>>> messages come along with labels or just simply by the source vertex,
>>>>>>>>>> if required by the algorithm, think of label propagation to have just
>>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think of why I
>>>>>>>>>>> would
>>>>>>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>>>>>>
>>>>>>>>>>> Avery
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>>>>>> all.
>>>>>>>>>>>>
>>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>>>>>> has
>>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>>> adapting
>>>>>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> means
>>>>>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>>>>>> case
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>>>>>> functionality
>>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>>>>>> of work
>>>>>>>>>>>>>> on
>>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>>>>>> return
>>>>>>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning null should
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>>> was more a matter of synching it with the other methods getting
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
>>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>>> calling code (null checks, or not, since people usually will
>>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner that
>>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>>> something for an input of no parameters, ie combining the empty
>>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>  /**
>>>>>>>>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>>>>>>>>   *
>>>>>>>>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these
>>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or
>>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>>>>>>>   */
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null
>>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to that
>>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when
>>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be useful.
>>>>>>>>>>>>>>>>>  Perhaps we
>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must contain at least
>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we
>>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive
>>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true
>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable for this
>>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there is at least
>>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging
>>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message for the
>>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following
>>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there
>>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for
>>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for
>>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>     public Iterable<M>        getMessages(I vertexId) {
>>>>>>>>>>>>>>>>>>>>       Iterable<M>        messages =
>>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>>>>>>>>                       combinedMsg =
>>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>>                       List<M>        tmp = new
>>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that
>>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the
>>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>>    Claudio Martella
>>>>>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>    Claudio Martella
>>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>    Claudio Martella
>>>>>>>>    claudio.martella@gmail.com
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.martella@gmail.com



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Sebastian Schelter <ss...@apache.org>.

I think we should make the combiner return a list/iterable that can
potentially be empty. However we should assume that the number of
elements returned is smaller than or equal to the number of input
elements (whats the use of a combiner if this is not given?). I also
concur that the code should not depend on the combiner being applied
(similar to the way combiners work in hadoop).

--sebastian

2012/1/10 Jakob Homan <jg...@gmail.com>:
> A composite object would essentially be a wrapper around a list and
> introduce the need for all vertices to be ready to extract that list
> at all times.  For instance, a combiner passed 10 messages may be able
> to combine 7 of them but do nothing with the other three, leaving four
> messages.  If we allow zero or one return elements, the combiner would
> have to create a composite object with a list of those four messages,
> whereas if we return a list, it just skips that step and returns the
> four messages.  Additionally, the receiving vertex would have to
> handle the possibility of a composite object every time even though
> the combiner may or may not have been run during the superstep, or
> even included in that job (since combiners are optional to the job
> itself).  It would be better if one could write a Giraph application
> that was completely agnostic of whether or not a combiner was
> included.
>
> On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
> <cl...@gmail.com> wrote:
>> I believe the argument of not letting users shoot their foot doesn't
>> stand :) Once you give them any API they have the power to do anything
>> wrong, as they already can with Giraph (or anything else for what it
>> matters), by designing an algorithm wrongly (which would be what it
>> would turn out to be a wrong combiner). It's definitely true that a
>> composite object would make the grouping (List<Group>) but I thought
>> we were talking about simplifying life to users :). I think it would
>> be more flexible (for the present and for the future) and also more
>> elegant,  but not necessarily a must (although it'd come practically
>> for free).
>>
>> Very cool discussion.
>>
>> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan <jg...@gmail.com> wrote:
>>>> Combiners can only modify the messages sent to a single vertex, so they can't send messages to other vertices.
>>> Yeah, the more I've thought about this, the more problematic it would
>>> be.  These new messages may be generated upon arrival at the
>>> destination vertex (since combiners can be run on the receiving vertex
>>> before processing as well).  When would they be forwarded to their new
>>> destinations at that point?  It would be possible to get into a
>>> feedback loop of messages jumping around before a superstep could ever
>>> actually be done.
>>>
>>> That being said, our inability to think of a good application doesn't
>>> mean there won't be one in the future, and it's probably better to be
>>> more flexible than try to impose what appears optimal now.  The
>>> benefit of forcing 0 or 1 message from a combiner seems less than the
>>> flexibility of allowing another list of messages (which may or may not
>>> be the same number of elements as the original, less than, or even
>>> more than).
>>>
>>>>Good discussion (it's making me really think about this)!
>>> Agreed.
>>>
>>>
>>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching <ac...@apache.org> wrote:
>>>> The general idea of combiners is to reduce the number of messages sent.
>>>>  Combiners are purely an optimization and the application should work
>>>> correctly without it (since it's never guaranteed to actually be called).
>>>>  Combiners can only modify the messages sent to a single vertex, so they
>>>> can't send messages to other vertices.  Any other work (i.e. sending
>>>> messages) should be done by the vertex in the compute() method.
>>>>
>>>> While I think that grouping behavior could actually be implemented within a
>>>> message object (still reducing the number of messages to 1 or 0) I suppose
>>>> that in some simple cases (i.e. grouping), it might be easier by doing it in
>>>> the combiner as you both have mentioned?  The only thing I suppose I'm
>>>> concerned about is letting users do something that is not optimal.
>>>>  Generally, expanding messages is not what you want your combiner to do.
>>>>  Also, since grouping behavior can be implemented in the message object, it
>>>> forces users to avoid shooting themselves in the foot.
>>>>
>>>> Good discussion (it's making me really think about this)!
>>>>
>>>> Avery
>>>>
>>>>
>>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>>
>>>>> Ok, now i see where you're going. I guess that the thing here is that
>>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>>> concretely it would probably need some local data related to D (edges
>>>>> values? vertexvalue?).
>>>>> I also think that k>  n is also possible in principle and we could let
>>>>> the user decide whether to use this power or not, once/if we agree
>>>>> that letting the user send k messages in the combiner is useful (and
>>>>> the grouping behavior shown by the label propagation example should do
>>>>> so).
>>>>>
>>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>>
>>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>>> which would have then then been sent to, say, M.  This would save the
>>>>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>>> that combiners can generate messages bound for vertices other than the
>>>>>> original destination, which I don't know if that has even been
>>>>>> discussed.
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>>> <cl...@gmail.com>  wrote:
>>>>>>>
>>>>>>> i'm not sure i understand what you'd save here. if the two messages
>>>>>>> were going to be expanded to k messages on the destination worker D,
>>>>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>>>>> right?
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>>>>>
>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>>
>>>>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>>>>> both generated on the same worker (W) by two two different vertices,
>>>>>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>>>>>> these messages, do some work on them, as it would have knowledge of
>>>>>>>> both, and generate some arbitrary number of messages bound for other
>>>>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>>> <cl...@gmail.com>  wrote:
>>>>>>>>>
>>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>>>>> semantically-different classes/types of messages, and you can imagine
>>>>>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>>>>>> messages come along with labels or just simply by the source vertex,
>>>>>>>>> if required by the algorithm, think of label propagation to have just
>>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>>
>>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> I agree that C&A doesn't require it, however, I can't think of why I
>>>>>>>>>> would
>>>>>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>>>>>
>>>>>>>>>> Avery
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>>>>> all.
>>>>>>>>>>>
>>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>>>>> has
>>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>>> adapting
>>>>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>>>>> that
>>>>>>>>>>>>> means
>>>>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>>>>> case
>>>>>>>>>>>>> when
>>>>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>>>>> that
>>>>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>>>>> functionality
>>>>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>>>>> of work
>>>>>>>>>>>>> on
>>>>>>>>>>>>> our side.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>>>>> return
>>>>>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>>>>>> although i agree with Jakob, I also believe returning null should
>>>>>>>>>>>>>> be
>>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>>>>> combine()
>>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>>>>> model,
>>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>>>>> that 0
>>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>>> was more a matter of synching it with the other methods getting
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
>>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>>> calling code (null checks, or not, since people usually will
>>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>>> it conceivable that one would want to write a combiner that
>>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>>> something for an input of no parameters, ie combining the empty
>>>>>>>>>>>>>>> list
>>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>  /**
>>>>>>>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>>>>>>>   *
>>>>>>>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these
>>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or
>>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>>>>>>   */
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null
>>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to that
>>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when
>>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>> messages.  I can't see a case where that would be useful.
>>>>>>>>>>>>>>>>  Perhaps we
>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> change the javadoc to insure that msgList must contain at least
>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we
>>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive
>>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true
>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable for this
>>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there is at least
>>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging
>>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message for the
>>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following
>>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there
>>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for
>>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for
>>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     public Iterable<M>        getMessages(I vertexId) {
>>>>>>>>>>>>>>>>>>>       Iterable<M>        messages =
>>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>>>>>>>                       combinedMsg =
>>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>>                       List<M>        tmp = new
>>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that
>>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the
>>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>>    Claudio Martella
>>>>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>    Claudio Martella
>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>    Claudio Martella
>>>>>>>    claudio.martella@gmail.com
>>>>>
>>>>>
>>>>>
>>>>
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Jakob Homan <jg...@gmail.com>.

A composite object would essentially be a wrapper around a list and
introduce the need for all vertices to be ready to extract that list
at all times.  For instance, a combiner passed 10 messages may be able
to combine 7 of them but do nothing with the other three, leaving four
messages.  If we allow zero or one return elements, the combiner would
have to create a composite object with a list of those four messages,
whereas if we return a list, it just skips that step and returns the
four messages.  Additionally, the receiving vertex would have to
handle the possibility of a composite object every time even though
the combiner may or may not have been run during the superstep, or
even included in that job (since combiners are optional to the job
itself).  It would be better if one could write a Giraph application
that was completely agnostic of whether or not a combiner was
included.

On Tue, Jan 10, 2012 at 12:00 PM, Claudio Martella
<cl...@gmail.com> wrote:
> I believe the argument of not letting users shoot their foot doesn't
> stand :) Once you give them any API they have the power to do anything
> wrong, as they already can with Giraph (or anything else for what it
> matters), by designing an algorithm wrongly (which would be what it
> would turn out to be a wrong combiner). It's definitely true that a
> composite object would make the grouping (List<Group>) but I thought
> we were talking about simplifying life to users :). I think it would
> be more flexible (for the present and for the future) and also more
> elegant,  but not necessarily a must (although it'd come practically
> for free).
>
> Very cool discussion.
>
> On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan <jg...@gmail.com> wrote:
>>> Combiners can only modify the messages sent to a single vertex, so they can't send messages to other vertices.
>> Yeah, the more I've thought about this, the more problematic it would
>> be.  These new messages may be generated upon arrival at the
>> destination vertex (since combiners can be run on the receiving vertex
>> before processing as well).  When would they be forwarded to their new
>> destinations at that point?  It would be possible to get into a
>> feedback loop of messages jumping around before a superstep could ever
>> actually be done.
>>
>> That being said, our inability to think of a good application doesn't
>> mean there won't be one in the future, and it's probably better to be
>> more flexible than try to impose what appears optimal now.  The
>> benefit of forcing 0 or 1 message from a combiner seems less than the
>> flexibility of allowing another list of messages (which may or may not
>> be the same number of elements as the original, less than, or even
>> more than).
>>
>>>Good discussion (it's making me really think about this)!
>> Agreed.
>>
>>
>> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching <ac...@apache.org> wrote:
>>> The general idea of combiners is to reduce the number of messages sent.
>>>  Combiners are purely an optimization and the application should work
>>> correctly without it (since it's never guaranteed to actually be called).
>>>  Combiners can only modify the messages sent to a single vertex, so they
>>> can't send messages to other vertices.  Any other work (i.e. sending
>>> messages) should be done by the vertex in the compute() method.
>>>
>>> While I think that grouping behavior could actually be implemented within a
>>> message object (still reducing the number of messages to 1 or 0) I suppose
>>> that in some simple cases (i.e. grouping), it might be easier by doing it in
>>> the combiner as you both have mentioned?  The only thing I suppose I'm
>>> concerned about is letting users do something that is not optimal.
>>>  Generally, expanding messages is not what you want your combiner to do.
>>>  Also, since grouping behavior can be implemented in the message object, it
>>> forces users to avoid shooting themselves in the foot.
>>>
>>> Good discussion (it's making me really think about this)!
>>>
>>> Avery
>>>
>>>
>>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>>
>>>> Ok, now i see where you're going. I guess that the thing here is that
>>>> the combiner would "act" like (on its behalf) D, and to do so
>>>> concretely it would probably need some local data related to D (edges
>>>> values? vertexvalue?).
>>>> I also think that k>  n is also possible in principle and we could let
>>>> the user decide whether to use this power or not, once/if we agree
>>>> that letting the user send k messages in the combiner is useful (and
>>>> the grouping behavior shown by the label propagation example should do
>>>> so).
>>>>
>>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>
>>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>>> which would have then then been sent to, say, M.  This would save the
>>>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>>>> it's a great example, but it is legal.  This is of course assuming
>>>>> that combiners can generate messages bound for vertices other than the
>>>>> original destination, which I don't know if that has even been
>>>>> discussed.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>>> <cl...@gmail.com>  wrote:
>>>>>>
>>>>>> i'm not sure i understand what you'd save here. if the two messages
>>>>>> were going to be expanded to k messages on the destination worker D,
>>>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>>>> right?
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>>>>
>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>> the combiner, can still be smaller than n,
>>>>>>>
>>>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>>>> both generated on the same worker (W) by two two different vertices,
>>>>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>>>>> these messages, do some work on them, as it would have knowledge of
>>>>>>> both, and generate some arbitrary number of messages bound for other
>>>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>>>
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>>> <cl...@gmail.com>  wrote:
>>>>>>>>
>>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>>>> semantically-different classes/types of messages, and you can imagine
>>>>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>>>>> messages come along with labels or just simply by the source vertex,
>>>>>>>> if required by the algorithm, think of label propagation to have just
>>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>>
>>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> I agree that C&A doesn't require it, however, I can't think of why I
>>>>>>>>> would
>>>>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>>>>
>>>>>>>>> Avery
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>>
>>>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>>>> all.
>>>>>>>>>>
>>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>>>> has
>>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>>> adapting
>>>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>>
>>>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>>>> that
>>>>>>>>>>>> means
>>>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>>>> case
>>>>>>>>>>>> when
>>>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>>>> that
>>>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>>>> functionality
>>>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>>>> of work
>>>>>>>>>>>> on
>>>>>>>>>>>> our side.
>>>>>>>>>>>>
>>>>>>>>>>>> Avery
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>>>> return
>>>>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>>>>> although i agree with Jakob, I also believe returning null should
>>>>>>>>>>>>> be
>>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>>>> combine()
>>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine
>>>>>>>>>>>>> to
>>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>>>> model,
>>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>>>> that 0
>>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>>> awkward, it
>>>>>>>>>>>>> was more a matter of synching it with the other methods getting
>>>>>>>>>>>>> the
>>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>>>>
>>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
>>>>>>>>>>>>>> to the
>>>>>>>>>>>>>> calling code (null checks, or not, since people usually will
>>>>>>>>>>>>>> forget
>>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>>> it conceivable that one would want to write a combiner that
>>>>>>>>>>>>>> returned
>>>>>>>>>>>>>> something for an input of no parameters, ie combining the empty
>>>>>>>>>>>>>> list
>>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>  /**
>>>>>>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>>>>>>   *
>>>>>>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these
>>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or
>>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>>>>>   */
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>>>>> support
>>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null
>>>>>>>>>>>>>>> from
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> combiner, in that case, no message should be sent to that
>>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when
>>>>>>>>>>>>>>> there
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>> messages.  I can't see a case where that would be useful.
>>>>>>>>>>>>>>>  Perhaps we
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>> change the javadoc to insure that msgList must contain at least
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> message
>>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we
>>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive
>>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true
>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> null return semantics of combine() is more suitable for this
>>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I think we currently implicitly assume that there is at least
>>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging
>>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>>> invokes the combiner only if at least one message for the
>>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following
>>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there
>>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for
>>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     public Iterable<M>        getMessages(I vertexId) {
>>>>>>>>>>>>>>>>>>       Iterable<M>        messages =
>>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>>>>>>                       combinedMsg =
>>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>>                       List<M>        tmp = new
>>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that
>>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the
>>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>>    Claudio Martella
>>>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>    Claudio Martella
>>>>>>>>    claudio.martella@gmail.com
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>    Claudio Martella
>>>>>>    claudio.martella@gmail.com
>>>>
>>>>
>>>>
>>>
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

I believe the argument of not letting users shoot their foot doesn't
stand :) Once you give them any API they have the power to do anything
wrong, as they already can with Giraph (or anything else for what it
matters), by designing an algorithm wrongly (which would be what it
would turn out to be a wrong combiner). It's definitely true that a
composite object would make the grouping (List<Group>) but I thought
we were talking about simplifying life to users :). I think it would
be more flexible (for the present and for the future) and also more
elegant,  but not necessarily a must (although it'd come practically
for free).

Very cool discussion.

On Tue, Jan 10, 2012 at 8:30 PM, Jakob Homan <jg...@gmail.com> wrote:
>> Combiners can only modify the messages sent to a single vertex, so they can't send messages to other vertices.
> Yeah, the more I've thought about this, the more problematic it would
> be.  These new messages may be generated upon arrival at the
> destination vertex (since combiners can be run on the receiving vertex
> before processing as well).  When would they be forwarded to their new
> destinations at that point?  It would be possible to get into a
> feedback loop of messages jumping around before a superstep could ever
> actually be done.
>
> That being said, our inability to think of a good application doesn't
> mean there won't be one in the future, and it's probably better to be
> more flexible than try to impose what appears optimal now.  The
> benefit of forcing 0 or 1 message from a combiner seems less than the
> flexibility of allowing another list of messages (which may or may not
> be the same number of elements as the original, less than, or even
> more than).
>
>>Good discussion (it's making me really think about this)!
> Agreed.
>
>
> On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching <ac...@apache.org> wrote:
>> The general idea of combiners is to reduce the number of messages sent.
>>  Combiners are purely an optimization and the application should work
>> correctly without it (since it's never guaranteed to actually be called).
>>  Combiners can only modify the messages sent to a single vertex, so they
>> can't send messages to other vertices.  Any other work (i.e. sending
>> messages) should be done by the vertex in the compute() method.
>>
>> While I think that grouping behavior could actually be implemented within a
>> message object (still reducing the number of messages to 1 or 0) I suppose
>> that in some simple cases (i.e. grouping), it might be easier by doing it in
>> the combiner as you both have mentioned?  The only thing I suppose I'm
>> concerned about is letting users do something that is not optimal.
>>  Generally, expanding messages is not what you want your combiner to do.
>>  Also, since grouping behavior can be implemented in the message object, it
>> forces users to avoid shooting themselves in the foot.
>>
>> Good discussion (it's making me really think about this)!
>>
>> Avery
>>
>>
>> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>>
>>> Ok, now i see where you're going. I guess that the thing here is that
>>> the combiner would "act" like (on its behalf) D, and to do so
>>> concretely it would probably need some local data related to D (edges
>>> values? vertexvalue?).
>>> I also think that k>  n is also possible in principle and we could let
>>> the user decide whether to use this power or not, once/if we agree
>>> that letting the user send k messages in the combiner is useful (and
>>> the grouping behavior shown by the label propagation example should do
>>> so).
>>>
>>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>
>>>> Those two messages would have gone to D, been expanded to, say, 4,
>>>> which would have then then been sent to, say, M.  This would save the
>>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>>> it's a great example, but it is legal.  This is of course assuming
>>>> that combiners can generate messages bound for vertices other than the
>>>> original destination, which I don't know if that has even been
>>>> discussed.
>>>>
>>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>>> <cl...@gmail.com>  wrote:
>>>>>
>>>>> i'm not sure i understand what you'd save here. if the two messages
>>>>> were going to be expanded to k messages on the destination worker D,
>>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>>> right?
>>>>>
>>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>>>
>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>> the combiner, can still be smaller than n,
>>>>>>
>>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>>> both generated on the same worker (W) by two two different vertices,
>>>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>>>> these messages, do some work on them, as it would have knowledge of
>>>>>> both, and generate some arbitrary number of messages bound for other
>>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>>
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>>> <cl...@gmail.com>  wrote:
>>>>>>>
>>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>>> semantically-different classes/types of messages, and you can imagine
>>>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>>>> messages come along with labels or just simply by the source vertex,
>>>>>>> if required by the algorithm, think of label propagation to have just
>>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>>
>>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> I agree that C&A doesn't require it, however, I can't think of why I
>>>>>>>> would
>>>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>>
>>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>>> all.
>>>>>>>>>
>>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>>> has
>>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>>> adapting
>>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>>
>>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>>
>>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>>> that
>>>>>>>>>>> means
>>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>>> case
>>>>>>>>>>> when
>>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>>> that
>>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>>> functionality
>>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>>> of work
>>>>>>>>>>> on
>>>>>>>>>>> our side.
>>>>>>>>>>>
>>>>>>>>>>> Avery
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>>> return
>>>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>>>> although i agree with Jakob, I also believe returning null should
>>>>>>>>>>>> be
>>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>>> reference/pointer.
>>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>>> combine()
>>>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine
>>>>>>>>>>>> to
>>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>>> model,
>>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>>
>>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>>> that 0
>>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>>> awkward, it
>>>>>>>>>>>> was more a matter of synching it with the other methods getting
>>>>>>>>>>>> the
>>>>>>>>>>>> messages parameter.
>>>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>>>
>>>>>>>>>>>> What do you think?
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
>>>>>>>>>>>>> to the
>>>>>>>>>>>>> calling code (null checks, or not, since people usually will
>>>>>>>>>>>>> forget
>>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>>> it conceivable that one would want to write a combiner that
>>>>>>>>>>>>> returned
>>>>>>>>>>>>> something for an input of no parameters, ie combining the empty
>>>>>>>>>>>>> list
>>>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>  /**
>>>>>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>>>>>   *
>>>>>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these
>>>>>>>>>>>>>> messages
>>>>>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or
>>>>>>>>>>>>>> null if
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>>>>   */
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>>>> support
>>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>>> particular
>>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null
>>>>>>>>>>>>>> from
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> combiner, in that case, no message should be sent to that
>>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when
>>>>>>>>>>>>>> there
>>>>>>>>>>>>>> are
>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>> messages.  I can't see a case where that would be useful.
>>>>>>>>>>>>>>  Perhaps we
>>>>>>>>>>>>>> should
>>>>>>>>>>>>>> change the javadoc to insure that msgList must contain at least
>>>>>>>>>>>>>> one
>>>>>>>>>>>>>> message
>>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we
>>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive
>>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> null return semantics of combine() is more suitable for this
>>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I think we currently implicitly assume that there is at least
>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging
>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>>> invokes the combiner only if at least one message for the
>>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following
>>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there
>>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for
>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for
>>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>     public Iterable<M>        getMessages(I vertexId) {
>>>>>>>>>>>>>>>>>       Iterable<M>        messages =
>>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>>>>>                       combinedMsg =
>>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>>>>>                       List<M>        tmp = new
>>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that
>>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the
>>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>>    Claudio Martella
>>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>    Claudio Martella
>>>>>>>    claudio.martella@gmail.com
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>    Claudio Martella
>>>>>    claudio.martella@gmail.com
>>>
>>>
>>>
>>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Jakob Homan <jg...@gmail.com>.

> Combiners can only modify the messages sent to a single vertex, so they can't send messages to other vertices.
Yeah, the more I've thought about this, the more problematic it would
be.  These new messages may be generated upon arrival at the
destination vertex (since combiners can be run on the receiving vertex
before processing as well).  When would they be forwarded to their new
destinations at that point?  It would be possible to get into a
feedback loop of messages jumping around before a superstep could ever
actually be done.

That being said, our inability to think of a good application doesn't
mean there won't be one in the future, and it's probably better to be
more flexible than try to impose what appears optimal now.  The
benefit of forcing 0 or 1 message from a combiner seems less than the
flexibility of allowing another list of messages (which may or may not
be the same number of elements as the original, less than, or even
more than).

>Good discussion (it's making me really think about this)!
Agreed.


On Tue, Jan 10, 2012 at 11:23 AM, Avery Ching <ac...@apache.org> wrote:
> The general idea of combiners is to reduce the number of messages sent.
>  Combiners are purely an optimization and the application should work
> correctly without it (since it's never guaranteed to actually be called).
>  Combiners can only modify the messages sent to a single vertex, so they
> can't send messages to other vertices.  Any other work (i.e. sending
> messages) should be done by the vertex in the compute() method.
>
> While I think that grouping behavior could actually be implemented within a
> message object (still reducing the number of messages to 1 or 0) I suppose
> that in some simple cases (i.e. grouping), it might be easier by doing it in
> the combiner as you both have mentioned?  The only thing I suppose I'm
> concerned about is letting users do something that is not optimal.
>  Generally, expanding messages is not what you want your combiner to do.
>  Also, since grouping behavior can be implemented in the message object, it
> forces users to avoid shooting themselves in the foot.
>
> Good discussion (it's making me really think about this)!
>
> Avery
>
>
> On 1/10/12 10:32 AM, Claudio Martella wrote:
>>
>> Ok, now i see where you're going. I guess that the thing here is that
>> the combiner would "act" like (on its behalf) D, and to do so
>> concretely it would probably need some local data related to D (edges
>> values? vertexvalue?).
>> I also think that k>  n is also possible in principle and we could let
>> the user decide whether to use this power or not, once/if we agree
>> that letting the user send k messages in the combiner is useful (and
>> the grouping behavior shown by the label propagation example should do
>> so).
>>
>> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>
>>> Those two messages would have gone to D, been expanded to, say, 4,
>>> which would have then then been sent to, say, M.  This would save the
>>> sending of the two to D and send the 4 directly to M.  I'm not saying
>>> it's a great example, but it is legal.  This is of course assuming
>>> that combiners can generate messages bound for vertices other than the
>>> original destination, which I don't know if that has even been
>>> discussed.
>>>
>>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>>> <cl...@gmail.com>  wrote:
>>>>
>>>> i'm not sure i understand what you'd save here. if the two messages
>>>> were going to be expanded to k messages on the destination worker D,
>>>> but you expand them on W, you end up sending k messages instead of 2.
>>>> right?
>>>>
>>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>>>
>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>> the combiner, can still be smaller than n,
>>>>>
>>>>> Right.  Grouping would be the most common case.  It would be possible
>>>>> to be great than k, as well.  For instance, consider two messages,
>>>>> both generated on the same worker (W) by two two different vertices,
>>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>>> these messages, do some work on them, as it would have knowledge of
>>>>> both, and generate some arbitrary number of messages bound for other
>>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>>
>>>>>
>>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>>> <cl...@gmail.com>  wrote:
>>>>>>
>>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>>> semantically-different classes/types of messages, and you can imagine
>>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>>> messages come along with labels or just simply by the source vertex,
>>>>>> if required by the algorithm, think of label propagation to have just
>>>>>> an example, or some sort of labeled-pagerank.
>>>>>>
>>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>
>>>>>>  wrote:
>>>>>>>
>>>>>>> I agree that C&A doesn't require it, however, I can't think of why I
>>>>>>> would
>>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>>
>>>>>>> Avery
>>>>>>>
>>>>>>>
>>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>>>
>>>>>>>>> In my opinion that means reducing to a single message or none at
>>>>>>>>> all.
>>>>>>>>
>>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>>> instance, doesn't require a single  or no value to be returned; it
>>>>>>>> has
>>>>>>>> the same interface as a reducer, zero or more values.  Would
>>>>>>>> adapting
>>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>>> (possibly empty) make it more useful?
>>>>>>>>
>>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>>>
>>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>>
>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> Combiners should be commutative and associative.  In my opinion
>>>>>>>>>> that
>>>>>>>>>> means
>>>>>>>>>> reducing to a single message or none at all.  Can you think of a
>>>>>>>>>> case
>>>>>>>>>> when
>>>>>>>>>> more than 1 message should be returned from a combiner?  I know
>>>>>>>>>> that
>>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>>> functionality
>>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount
>>>>>>>>>> of work
>>>>>>>>>> on
>>>>>>>>>> our side.
>>>>>>>>>>
>>>>>>>>>> Avery
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>>>
>>>>>>>>>>> To clarify, I was not discussing the possibility for combine to
>>>>>>>>>>> return
>>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>>> although i agree with Jakob, I also believe returning null should
>>>>>>>>>>> be
>>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>>> reference/pointer.
>>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let
>>>>>>>>>>> combine()
>>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine
>>>>>>>>>>> to
>>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the
>>>>>>>>>>> model,
>>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>>
>>>>>>>>>>> As far as the size of the messages parameter, I agree with you
>>>>>>>>>>> that 0
>>>>>>>>>>> messages gives nothing to combine and it would be somehow
>>>>>>>>>>> awkward, it
>>>>>>>>>>> was more a matter of synching it with the other methods getting
>>>>>>>>>>> the
>>>>>>>>>>> messages parameter.
>>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>>
>>>>>>>>>>> What do you think?
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity
>>>>>>>>>>>> to the
>>>>>>>>>>>> calling code (null checks, or not, since people usually will
>>>>>>>>>>>> forget
>>>>>>>>>>>> them).  Avery is correct that combiners are application
>>>>>>>>>>>> specific.  Is
>>>>>>>>>>>> it conceivable that one would want to write a combiner that
>>>>>>>>>>>> returned
>>>>>>>>>>>> something for an input of no parameters, ie combining the empty
>>>>>>>>>>>> list
>>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>>
>>>>>>>>>>>>>  /**
>>>>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>>>>   *
>>>>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these
>>>>>>>>>>>>> messages
>>>>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or
>>>>>>>>>>>>> null if
>>>>>>>>>>>>> no
>>>>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>>>   */
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>>> support
>>>>>>>>>>>>> various use cases.  A combiner should be particular to a
>>>>>>>>>>>>> particular
>>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null
>>>>>>>>>>>>> from
>>>>>>>>>>>>> a
>>>>>>>>>>>>> combiner, in that case, no message should be sent to that
>>>>>>>>>>>>> vertex.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when
>>>>>>>>>>>>> there
>>>>>>>>>>>>> are
>>>>>>>>>>>>> 0
>>>>>>>>>>>>> messages.  I can't see a case where that would be useful.
>>>>>>>>>>>>>  Perhaps we
>>>>>>>>>>>>> should
>>>>>>>>>>>>> change the javadoc to insure that msgList must contain at least
>>>>>>>>>>>>> one
>>>>>>>>>>>>> message
>>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Avery
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we
>>>>>>>>>>>>>> want to
>>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive
>>>>>>>>>>>>>> 0
>>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> null return semantics of combine() is more suitable for this
>>>>>>>>>>>>>> exact
>>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian
>>>>>>>>>>>>>> Schelter<ss...@apache.org>
>>>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think we currently implicitly assume that there is at least
>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging
>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>> only
>>>>>>>>>>>>>>> invokes the combiner only if at least one message for the
>>>>>>>>>>>>>>> target
>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> However, we should not rely on implicit implementation
>>>>>>>>>>>>>>> details but
>>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following
>>>>>>>>>>>>>>>> reason:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there
>>>>>>>>>>>>>>>> are no
>>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for
>>>>>>>>>>>>>>>> each
>>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>     public Iterable<M>        getMessages(I vertexId) {
>>>>>>>>>>>>>>>>       Iterable<M>        messages =
>>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>>>>                       combinedMsg =
>>>>>>>>>>>>>>>> combiner.combine(vertexId,
>>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>>>>                       List<M>        tmp = new
>>>>>>>>>>>>>>>> ArrayList<M>(1);
>>>>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that
>>>>>>>>>>>>>>>> returns
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the
>>>>>>>>>>>>>>>> combiner
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>>    Claudio Martella
>>>>>>>>>    claudio.martella@gmail.com
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>    Claudio Martella
>>>>>>    claudio.martella@gmail.com
>>>>
>>>>
>>>>
>>>> --
>>>>    Claudio Martella
>>>>    claudio.martella@gmail.com
>>
>>
>>
>

Re: on the semantics of the combiner

Posted by Avery Ching <ac...@apache.org>.

The general idea of combiners is to reduce the number of messages sent.  
Combiners are purely an optimization and the application should work 
correctly without it (since it's never guaranteed to actually be 
called).  Combiners can only modify the messages sent to a single 
vertex, so they can't send messages to other vertices.  Any other work 
(i.e. sending messages) should be done by the vertex in the compute() 
method.

While I think that grouping behavior could actually be implemented 
within a message object (still reducing the number of messages to 1 or 
0) I suppose that in some simple cases (i.e. grouping), it might be 
easier by doing it in the combiner as you both have mentioned?  The only 
thing I suppose I'm concerned about is letting users do something that 
is not optimal.  Generally, expanding messages is not what you want your 
combiner to do.  Also, since grouping behavior can be implemented in the 
message object, it forces users to avoid shooting themselves in the foot.

Good discussion (it's making me really think about this)!

Avery

On 1/10/12 10:32 AM, Claudio Martella wrote:
> Ok, now i see where you're going. I guess that the thing here is that
> the combiner would "act" like (on its behalf) D, and to do so
> concretely it would probably need some local data related to D (edges
> values? vertexvalue?).
> I also think that k>  n is also possible in principle and we could let
> the user decide whether to use this power or not, once/if we agree
> that letting the user send k messages in the combiner is useful (and
> the grouping behavior shown by the label propagation example should do
> so).
>
> On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan<jg...@gmail.com>  wrote:
>> Those two messages would have gone to D, been expanded to, say, 4,
>> which would have then then been sent to, say, M.  This would save the
>> sending of the two to D and send the 4 directly to M.  I'm not saying
>> it's a great example, but it is legal.  This is of course assuming
>> that combiners can generate messages bound for vertices other than the
>> original destination, which I don't know if that has even been
>> discussed.
>>
>> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
>> <cl...@gmail.com>  wrote:
>>> i'm not sure i understand what you'd save here. if the two messages
>>> were going to be expanded to k messages on the destination worker D,
>>> but you expand them on W, you end up sending k messages instead of 2.
>>> right?
>>>
>>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>> the combiner, can still be smaller than n,
>>>> Right.  Grouping would be the most common case.  It would be possible
>>>> to be great than k, as well.  For instance, consider two messages,
>>>> both generated on the same worker (W) by two two different vertices,
>>>> both bound for another vertex, Z.  A combiner on W could get both of
>>>> these messages, do some work on them, as it would have knowledge of
>>>> both, and generate some arbitrary number of messages bound for other
>>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>>
>>>>
>>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>>> <cl...@gmail.com>  wrote:
>>>>> it doesn't have to be expand, k, the number of elements returned by
>>>>> the combiner, can still be smaller than n, the size of the messages
>>>>> parameter. as a first example, you can imagine your vertex receiving
>>>>> semantically-different classes/types of messages, and you can imagine
>>>>> willing to be summarizing them in different messages, i.e. if your
>>>>> messages come along with labels or just simply by the source vertex,
>>>>> if required by the algorithm, think of label propagation to have just
>>>>> an example, or some sort of labeled-pagerank.
>>>>>
>>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching<ac...@apache.org>  wrote:
>>>>>> I agree that C&A doesn't require it, however, I can't think of why I would
>>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>>
>>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>> In my opinion that means reducing to a single message or none at all.
>>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>>> instance, doesn't require a single  or no value to be returned; it has
>>>>>>> the same interface as a reducer, zero or more values.  Would adapting
>>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>>> (possibly empty) make it more useful?
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>>> <cl...@gmail.com>    wrote:
>>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>>
>>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>    wrote:
>>>>>>>>> Combiners should be commutative and associative.  In my opinion that
>>>>>>>>> means
>>>>>>>>> reducing to a single message or none at all.  Can you think of a case
>>>>>>>>> when
>>>>>>>>> more than 1 message should be returned from a combiner?  I know that
>>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>>> functionality
>>>>>>>>> (returning no messages), is nice to have and isn't a huge amount of work
>>>>>>>>> on
>>>>>>>>> our side.
>>>>>>>>>
>>>>>>>>> Avery
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>> To clarify, I was not discussing the possibility for combine to return
>>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>>> although i agree with Jakob, I also believe returning null should be
>>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>>> reference/pointer.
>>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>>>>>>>> but maybe it's too much.
>>>>>>>>>>
>>>>>>>>>> As far as the size of the messages parameter, I agree with you that 0
>>>>>>>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>>>>>>>> was more a matter of synching it with the other methods getting the
>>>>>>>>>> messages parameter.
>>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>>
>>>>>>>>>> What do you think?
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>>   wrote:
>>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>>>>>>>> calling code (null checks, or not, since people usually will forget
>>>>>>>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>>>>>>>> it conceivable that one would want to write a combiner that returned
>>>>>>>>>>> something for an input of no parameters, ie combining the empty list
>>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>>   wrote:
>>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>>
>>>>>>>>>>>>   /**
>>>>>>>>>>>>    * Combines message values for a particular vertex index.
>>>>>>>>>>>>    *
>>>>>>>>>>>>    * @param vertexIndex Index of the vertex getting these messages
>>>>>>>>>>>>    * @param msgList List of the messages to be combined
>>>>>>>>>>>>    * @return Message that is combined from {@link MsgList} or null if
>>>>>>>>>>>> no
>>>>>>>>>>>>    *         message it to be sent
>>>>>>>>>>>>    * @throws IOException
>>>>>>>>>>>>    */
>>>>>>>>>>>>
>>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>>> support
>>>>>>>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null from
>>>>>>>>>>>> a
>>>>>>>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>>>>>>>
>>>>>>>>>>>> It seems like it would be an overhead to call a combiner when there
>>>>>>>>>>>> are
>>>>>>>>>>>> 0
>>>>>>>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>>>>>>>> should
>>>>>>>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>>>>>>>> message
>>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>>
>>>>>>>>>>>> Avery
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>>
>>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>>> putMessages()
>>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>>> bugged.
>>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true that
>>>>>>>>>>>>> the
>>>>>>>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>>>>>>>> situation.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>>>>>>>   wrote:
>>>>>>>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>>>>>>>> only
>>>>>>>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>      public Iterable<M>        getMessages(I vertexId) {
>>>>>>>>>>>>>>>        Iterable<M>        messages =
>>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>>        if (combiner != null) {
>>>>>>>>>>>>>>>                M combinedMsg;
>>>>>>>>>>>>>>>                try {
>>>>>>>>>>>>>>>                        combinedMsg = combiner.combine(vertexId,
>>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>>>>>>>                        throw new RuntimeException("could not
>>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>>>>>>>                        List<M>        tmp = new ArrayList<M>(1);
>>>>>>>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>>>>>>>                        messages = tmp;
>>>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>>>                        messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>>                }
>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>        return messages;
>>>>>>>>>>>>>>>      }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>>     Claudio Martella
>>>>>>>>     claudio.martella@gmail.com
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>     Claudio Martella
>>>>>     claudio.martella@gmail.com
>>>
>>>
>>> --
>>>     Claudio Martella
>>>     claudio.martella@gmail.com
>
>

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

Ok, now i see where you're going. I guess that the thing here is that
the combiner would "act" like (on its behalf) D, and to do so
concretely it would probably need some local data related to D (edges
values? vertexvalue?).
I also think that k > n is also possible in principle and we could let
the user decide whether to use this power or not, once/if we agree
that letting the user send k messages in the combiner is useful (and
the grouping behavior shown by the label propagation example should do
so).

On Tue, Jan 10, 2012 at 7:04 PM, Jakob Homan <jg...@gmail.com> wrote:
> Those two messages would have gone to D, been expanded to, say, 4,
> which would have then then been sent to, say, M.  This would save the
> sending of the two to D and send the 4 directly to M.  I'm not saying
> it's a great example, but it is legal.  This is of course assuming
> that combiners can generate messages bound for vertices other than the
> original destination, which I don't know if that has even been
> discussed.
>
> On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
> <cl...@gmail.com> wrote:
>> i'm not sure i understand what you'd save here. if the two messages
>> were going to be expanded to k messages on the destination worker D,
>> but you expand them on W, you end up sending k messages instead of 2.
>> right?
>>
>> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan <jg...@gmail.com> wrote:
>>>> it doesn't have to be expand, k, the number of elements returned by
>>>> the combiner, can still be smaller than n,
>>> Right.  Grouping would be the most common case.  It would be possible
>>> to be great than k, as well.  For instance, consider two messages,
>>> both generated on the same worker (W) by two two different vertices,
>>> both bound for another vertex, Z.  A combiner on W could get both of
>>> these messages, do some work on them, as it would have knowledge of
>>> both, and generate some arbitrary number of messages bound for other
>>> vertices (thus saving the shuffle/transfer of the original messages).
>>>
>>>
>>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>>> <cl...@gmail.com> wrote:
>>>> it doesn't have to be expand, k, the number of elements returned by
>>>> the combiner, can still be smaller than n, the size of the messages
>>>> parameter. as a first example, you can imagine your vertex receiving
>>>> semantically-different classes/types of messages, and you can imagine
>>>> willing to be summarizing them in different messages, i.e. if your
>>>> messages come along with labels or just simply by the source vertex,
>>>> if required by the algorithm, think of label propagation to have just
>>>> an example, or some sort of labeled-pagerank.
>>>>
>>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching <ac...@apache.org> wrote:
>>>>> I agree that C&A doesn't require it, however, I can't think of why I would
>>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>>
>>>>> Avery
>>>>>
>>>>>
>>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>>
>>>>>>> In my opinion that means reducing to a single message or none at all.
>>>>>>
>>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>>> instance, doesn't require a single  or no value to be returned; it has
>>>>>> the same interface as a reducer, zero or more values.  Would adapting
>>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>>> (possibly empty) make it more useful?
>>>>>>
>>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>>> <cl...@gmail.com>  wrote:
>>>>>>>
>>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>  wrote:
>>>>>>>>
>>>>>>>> Combiners should be commutative and associative.  In my opinion that
>>>>>>>> means
>>>>>>>> reducing to a single message or none at all.  Can you think of a case
>>>>>>>> when
>>>>>>>> more than 1 message should be returned from a combiner?  I know that
>>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>>> functionality
>>>>>>>> (returning no messages), is nice to have and isn't a huge amount of work
>>>>>>>> on
>>>>>>>> our side.
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>>
>>>>>>>>> To clarify, I was not discussing the possibility for combine to return
>>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>>> although i agree with Jakob, I also believe returning null should be
>>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>>> reference/pointer.
>>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>>>>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>>>>>>> but maybe it's too much.
>>>>>>>>>
>>>>>>>>> As far as the size of the messages parameter, I agree with you that 0
>>>>>>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>>>>>>> was more a matter of synching it with the other methods getting the
>>>>>>>>> messages parameter.
>>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>>
>>>>>>>>> What do you think?
>>>>>>>>>
>>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>>>>>>> calling code (null checks, or not, since people usually will forget
>>>>>>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>>>>>>> it conceivable that one would want to write a combiner that returned
>>>>>>>>>> something for an input of no parameters, ie combining the empty list
>>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>>
>>>>>>>>>>>  /**
>>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>>   *
>>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or null if
>>>>>>>>>>> no
>>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>>   * @throws IOException
>>>>>>>>>>>   */
>>>>>>>>>>>
>>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>>> support
>>>>>>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>>>>>>> compute() algorithm.  I think it should be legal to return null from
>>>>>>>>>>> a
>>>>>>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>>>>>>
>>>>>>>>>>> It seems like it would be an overhead to call a combiner when there
>>>>>>>>>>> are
>>>>>>>>>>> 0
>>>>>>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>>>>>>> should
>>>>>>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>>>>>>> message
>>>>>>>>>>> to have combine() being called.
>>>>>>>>>>>
>>>>>>>>>>> Avery
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>>
>>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>>> putMessages()
>>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>>
>>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>>> bugged.
>>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true that
>>>>>>>>>>>> the
>>>>>>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>>>>>>> situation.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>>>>>>  wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>>>>>>> only
>>>>>>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>>>>>>> vertex
>>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>>>>>>> vertex
>>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>     public Iterable<M>      getMessages(I vertexId) {
>>>>>>>>>>>>>>       Iterable<M>      messages =
>>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>>> combine",
>>>>>>>>>>>>>> e);
>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>>                       List<M>      tmp = new ArrayList<M>(1);
>>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>>               }
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>>    Claudio Martella
>>>>>>>    claudio.martella@gmail.com
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>    Claudio Martella
>>>>    claudio.martella@gmail.com
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Jakob Homan <jg...@gmail.com>.

Those two messages would have gone to D, been expanded to, say, 4,
which would have then then been sent to, say, M.  This would save the
sending of the two to D and send the 4 directly to M.  I'm not saying
it's a great example, but it is legal.  This is of course assuming
that combiners can generate messages bound for vertices other than the
original destination, which I don't know if that has even been
discussed.

On Tue, Jan 10, 2012 at 9:49 AM, Claudio Martella
<cl...@gmail.com> wrote:
> i'm not sure i understand what you'd save here. if the two messages
> were going to be expanded to k messages on the destination worker D,
> but you expand them on W, you end up sending k messages instead of 2.
> right?
>
> On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan <jg...@gmail.com> wrote:
>>> it doesn't have to be expand, k, the number of elements returned by
>>> the combiner, can still be smaller than n,
>> Right.  Grouping would be the most common case.  It would be possible
>> to be great than k, as well.  For instance, consider two messages,
>> both generated on the same worker (W) by two two different vertices,
>> both bound for another vertex, Z.  A combiner on W could get both of
>> these messages, do some work on them, as it would have knowledge of
>> both, and generate some arbitrary number of messages bound for other
>> vertices (thus saving the shuffle/transfer of the original messages).
>>
>>
>> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
>> <cl...@gmail.com> wrote:
>>> it doesn't have to be expand, k, the number of elements returned by
>>> the combiner, can still be smaller than n, the size of the messages
>>> parameter. as a first example, you can imagine your vertex receiving
>>> semantically-different classes/types of messages, and you can imagine
>>> willing to be summarizing them in different messages, i.e. if your
>>> messages come along with labels or just simply by the source vertex,
>>> if required by the algorithm, think of label propagation to have just
>>> an example, or some sort of labeled-pagerank.
>>>
>>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching <ac...@apache.org> wrote:
>>>> I agree that C&A doesn't require it, however, I can't think of why I would
>>>> want to use a combiner to expand the number of messages.  Can you?
>>>>
>>>> Avery
>>>>
>>>>
>>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>>
>>>>>> In my opinion that means reducing to a single message or none at all.
>>>>>
>>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>>> instance, doesn't require a single  or no value to be returned; it has
>>>>> the same interface as a reducer, zero or more values.  Would adapting
>>>>> the semantics of Giraph's combiner to return a list of messages
>>>>> (possibly empty) make it more useful?
>>>>>
>>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>>> <cl...@gmail.com>  wrote:
>>>>>>
>>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>>
>>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>  wrote:
>>>>>>>
>>>>>>> Combiners should be commutative and associative.  In my opinion that
>>>>>>> means
>>>>>>> reducing to a single message or none at all.  Can you think of a case
>>>>>>> when
>>>>>>> more than 1 message should be returned from a combiner?  I know that
>>>>>>> returning null isn't preferable in general, but I think that
>>>>>>> functionality
>>>>>>> (returning no messages), is nice to have and isn't a huge amount of work
>>>>>>> on
>>>>>>> our side.
>>>>>>>
>>>>>>> Avery
>>>>>>>
>>>>>>>
>>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>>
>>>>>>>> To clarify, I was not discussing the possibility for combine to return
>>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>>> although i agree with Jakob, I also believe returning null should be
>>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>>> reference/pointer.
>>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>>>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>>>>>> but maybe it's too much.
>>>>>>>>
>>>>>>>> As far as the size of the messages parameter, I agree with you that 0
>>>>>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>>>>>> was more a matter of synching it with the other methods getting the
>>>>>>>> messages parameter.
>>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>>
>>>>>>>> What do you think?
>>>>>>>>
>>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>>>>>> calling code (null checks, or not, since people usually will forget
>>>>>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>>>>>> it conceivable that one would want to write a combiner that returned
>>>>>>>>> something for an input of no parameters, ie combining the empty list
>>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>>> combining a single message would result in that message.
>>>>>>>>>
>>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>>
>>>>>>>>>>  /**
>>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>>   *
>>>>>>>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>>   * @return Message that is combined from {@link MsgList} or null if
>>>>>>>>>> no
>>>>>>>>>>   *         message it to be sent
>>>>>>>>>>   * @throws IOException
>>>>>>>>>>   */
>>>>>>>>>>
>>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>>> support
>>>>>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>>>>>> compute() algorithm.  I think it should be legal to return null from
>>>>>>>>>> a
>>>>>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>>>>>
>>>>>>>>>> It seems like it would be an overhead to call a combiner when there
>>>>>>>>>> are
>>>>>>>>>> 0
>>>>>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>>>>>> should
>>>>>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>>>>>> message
>>>>>>>>>> to have combine() being called.
>>>>>>>>>>
>>>>>>>>>> Avery
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>>
>>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>>> putMessages()
>>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>>
>>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>>> bugged.
>>>>>>>>>>> It's true that the sum of no values is 0, but it's also true that
>>>>>>>>>>> the
>>>>>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>>>>>> situation.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>>>>>  wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>>>>>> only
>>>>>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>>>>>> vertex
>>>>>>>>>>>> has been sent.
>>>>>>>>>>>>
>>>>>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>>
>>>>>>>>>>>> --sebastian
>>>>>>>>>>>>
>>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>>
>>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>>>>>
>>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>>
>>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>>>>>> vertex
>>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>>
>>>>>>>>>>>>>     public Iterable<M>      getMessages(I vertexId) {
>>>>>>>>>>>>>       Iterable<M>      messages =
>>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>>               try {
>>>>>>>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>>>>>>>> messages);
>>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>>> combine",
>>>>>>>>>>>>> e);
>>>>>>>>>>>>>               }
>>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>>                       List<M>      tmp = new ArrayList<M>(1);
>>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>>               } else {
>>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>>               }
>>>>>>>>>>>>>       }
>>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns
>>>>>>>>>>>>> a
>>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>>>>>> with
>>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>>    Claudio Martella
>>>>>>    claudio.martella@gmail.com
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.martella@gmail.com
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

i'm not sure i understand what you'd save here. if the two messages
were going to be expanded to k messages on the destination worker D,
but you expand them on W, you end up sending k messages instead of 2.
right?

On Tue, Jan 10, 2012 at 6:26 PM, Jakob Homan <jg...@gmail.com> wrote:
>> it doesn't have to be expand, k, the number of elements returned by
>> the combiner, can still be smaller than n,
> Right.  Grouping would be the most common case.  It would be possible
> to be great than k, as well.  For instance, consider two messages,
> both generated on the same worker (W) by two two different vertices,
> both bound for another vertex, Z.  A combiner on W could get both of
> these messages, do some work on them, as it would have knowledge of
> both, and generate some arbitrary number of messages bound for other
> vertices (thus saving the shuffle/transfer of the original messages).
>
>
> On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
> <cl...@gmail.com> wrote:
>> it doesn't have to be expand, k, the number of elements returned by
>> the combiner, can still be smaller than n, the size of the messages
>> parameter. as a first example, you can imagine your vertex receiving
>> semantically-different classes/types of messages, and you can imagine
>> willing to be summarizing them in different messages, i.e. if your
>> messages come along with labels or just simply by the source vertex,
>> if required by the algorithm, think of label propagation to have just
>> an example, or some sort of labeled-pagerank.
>>
>> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching <ac...@apache.org> wrote:
>>> I agree that C&A doesn't require it, however, I can't think of why I would
>>> want to use a combiner to expand the number of messages.  Can you?
>>>
>>> Avery
>>>
>>>
>>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>>
>>>>> In my opinion that means reducing to a single message or none at all.
>>>>
>>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>>> instance, doesn't require a single  or no value to be returned; it has
>>>> the same interface as a reducer, zero or more values.  Would adapting
>>>> the semantics of Giraph's combiner to return a list of messages
>>>> (possibly empty) make it more useful?
>>>>
>>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>>> <cl...@gmail.com>  wrote:
>>>>>
>>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>>
>>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>  wrote:
>>>>>>
>>>>>> Combiners should be commutative and associative.  In my opinion that
>>>>>> means
>>>>>> reducing to a single message or none at all.  Can you think of a case
>>>>>> when
>>>>>> more than 1 message should be returned from a combiner?  I know that
>>>>>> returning null isn't preferable in general, but I think that
>>>>>> functionality
>>>>>> (returning no messages), is nice to have and isn't a huge amount of work
>>>>>> on
>>>>>> our side.
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>>
>>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>>
>>>>>>> To clarify, I was not discussing the possibility for combine to return
>>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>>> although i agree with Jakob, I also believe returning null should be
>>>>>>> avoided but only used, roughly, as an init value for a
>>>>>>> reference/pointer.
>>>>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>>>>> but maybe it's too much.
>>>>>>>
>>>>>>> As far as the size of the messages parameter, I agree with you that 0
>>>>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>>>>> was more a matter of synching it with the other methods getting the
>>>>>>> messages parameter.
>>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>>
>>>>>>> What do you think?
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>>>>> calling code (null checks, or not, since people usually will forget
>>>>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>>>>> it conceivable that one would want to write a combiner that returned
>>>>>>>> something for an input of no parameters, ie combining the empty list
>>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>>> combining a single message would result in that message.
>>>>>>>>
>>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>>
>>>>>>>>>  /**
>>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>>   *
>>>>>>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>>   * @return Message that is combined from {@link MsgList} or null if
>>>>>>>>> no
>>>>>>>>>   *         message it to be sent
>>>>>>>>>   * @throws IOException
>>>>>>>>>   */
>>>>>>>>>
>>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>>> support
>>>>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>>>>> compute() algorithm.  I think it should be legal to return null from
>>>>>>>>> a
>>>>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>>>>
>>>>>>>>> It seems like it would be an overhead to call a combiner when there
>>>>>>>>> are
>>>>>>>>> 0
>>>>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>>>>> should
>>>>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>>>>> message
>>>>>>>>> to have combine() being called.
>>>>>>>>>
>>>>>>>>> Avery
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>>
>>>>>>>>>> Hi Sebastian,
>>>>>>>>>>
>>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>>> putMessages()
>>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>>
>>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>>> bugged.
>>>>>>>>>> It's true that the sum of no values is 0, but it's also true that
>>>>>>>>>> the
>>>>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>>>>> situation.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>>>>  wrote:
>>>>>>>>>>>
>>>>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>>>>> only
>>>>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>>>>> vertex
>>>>>>>>>>> has been sent.
>>>>>>>>>>>
>>>>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>>
>>>>>>>>>>> --sebastian
>>>>>>>>>>>
>>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hello list,
>>>>>>>>>>>>
>>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>>>>
>>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>>
>>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>>>>> vertex
>>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>>
>>>>>>>>>>>>     public Iterable<M>      getMessages(I vertexId) {
>>>>>>>>>>>>       Iterable<M>      messages =
>>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>>               try {
>>>>>>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>>>>>>> messages);
>>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>>> combine",
>>>>>>>>>>>> e);
>>>>>>>>>>>>               }
>>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>>                       List<M>      tmp = new ArrayList<M>(1);
>>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>>               } else {
>>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>>               }
>>>>>>>>>>>>       }
>>>>>>>>>>>>       return messages;
>>>>>>>>>>>>     }
>>>>>>>>>>>>
>>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns
>>>>>>>>>>>> a
>>>>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>>>>> with
>>>>>>>>>>>> 0 messages?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>    Claudio Martella
>>>>>    claudio.martella@gmail.com
>>>
>>>
>>
>>
>>
>> --
>>    Claudio Martella
>>    claudio.martella@gmail.com



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Jakob Homan <jg...@gmail.com>.

> it doesn't have to be expand, k, the number of elements returned by
> the combiner, can still be smaller than n,
Right.  Grouping would be the most common case.  It would be possible
to be great than k, as well.  For instance, consider two messages,
both generated on the same worker (W) by two two different vertices,
both bound for another vertex, Z.  A combiner on W could get both of
these messages, do some work on them, as it would have knowledge of
both, and generate some arbitrary number of messages bound for other
vertices (thus saving the shuffle/transfer of the original messages).


On Tue, Jan 10, 2012 at 12:08 AM, Claudio Martella
<cl...@gmail.com> wrote:
> it doesn't have to be expand, k, the number of elements returned by
> the combiner, can still be smaller than n, the size of the messages
> parameter. as a first example, you can imagine your vertex receiving
> semantically-different classes/types of messages, and you can imagine
> willing to be summarizing them in different messages, i.e. if your
> messages come along with labels or just simply by the source vertex,
> if required by the algorithm, think of label propagation to have just
> an example, or some sort of labeled-pagerank.
>
> On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching <ac...@apache.org> wrote:
>> I agree that C&A doesn't require it, however, I can't think of why I would
>> want to use a combiner to expand the number of messages.  Can you?
>>
>> Avery
>>
>>
>> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>>
>>>> In my opinion that means reducing to a single message or none at all.
>>>
>>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>>> instance, doesn't require a single  or no value to be returned; it has
>>> the same interface as a reducer, zero or more values.  Would adapting
>>> the semantics of Giraph's combiner to return a list of messages
>>> (possibly empty) make it more useful?
>>>
>>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>>> <cl...@gmail.com>  wrote:
>>>>
>>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>>
>>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>  wrote:
>>>>>
>>>>> Combiners should be commutative and associative.  In my opinion that
>>>>> means
>>>>> reducing to a single message or none at all.  Can you think of a case
>>>>> when
>>>>> more than 1 message should be returned from a combiner?  I know that
>>>>> returning null isn't preferable in general, but I think that
>>>>> functionality
>>>>> (returning no messages), is nice to have and isn't a huge amount of work
>>>>> on
>>>>> our side.
>>>>>
>>>>> Avery
>>>>>
>>>>>
>>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>>
>>>>>> To clarify, I was not discussing the possibility for combine to return
>>>>>> null. I see why it would be useful, given that combine returns M,
>>>>>> there's no other way to let combiner ask not to send any message,
>>>>>> although i agree with Jakob, I also believe returning null should be
>>>>>> avoided but only used, roughly, as an init value for a
>>>>>> reference/pointer.
>>>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>>>> but maybe it's too much.
>>>>>>
>>>>>> As far as the size of the messages parameter, I agree with you that 0
>>>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>>>> was more a matter of synching it with the other methods getting the
>>>>>> messages parameter.
>>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>>
>>>>>> What do you think?
>>>>>>
>>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>>  wrote:
>>>>>>>
>>>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>>>> calling code (null checks, or not, since people usually will forget
>>>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>>>> it conceivable that one would want to write a combiner that returned
>>>>>>> something for an input of no parameters, ie combining the empty list
>>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>>> combining a single message would result in that message.
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>>
>>>>>>>>  /**
>>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>>   *
>>>>>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>>   * @return Message that is combined from {@link MsgList} or null if
>>>>>>>> no
>>>>>>>>   *         message it to be sent
>>>>>>>>   * @throws IOException
>>>>>>>>   */
>>>>>>>>
>>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>>> support
>>>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>>>> compute() algorithm.  I think it should be legal to return null from
>>>>>>>> a
>>>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>>>
>>>>>>>> It seems like it would be an overhead to call a combiner when there
>>>>>>>> are
>>>>>>>> 0
>>>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>>>> should
>>>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>>>> message
>>>>>>>> to have combine() being called.
>>>>>>>>
>>>>>>>> Avery
>>>>>>>>
>>>>>>>>
>>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>>
>>>>>>>>> Hi Sebastian,
>>>>>>>>>
>>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>>> putMessages()
>>>>>>>>> and compute(), and act accordingly.
>>>>>>>>>
>>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>>> bugged.
>>>>>>>>> It's true that the sum of no values is 0, but it's also true that
>>>>>>>>> the
>>>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>>>> situation.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>>>> only
>>>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>>>> vertex
>>>>>>>>>> has been sent.
>>>>>>>>>>
>>>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>>
>>>>>>>>>> --sebastian
>>>>>>>>>>
>>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hello list,
>>>>>>>>>>>
>>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>>>
>>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>>
>>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>>>> vertex
>>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>>
>>>>>>>>>>>     public Iterable<M>      getMessages(I vertexId) {
>>>>>>>>>>>       Iterable<M>      messages =
>>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>>               try {
>>>>>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>>>>>> messages);
>>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>>> combine",
>>>>>>>>>>> e);
>>>>>>>>>>>               }
>>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>>                       List<M>      tmp = new ArrayList<M>(1);
>>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>>               } else {
>>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>>               }
>>>>>>>>>>>       }
>>>>>>>>>>>       return messages;
>>>>>>>>>>>     }
>>>>>>>>>>>
>>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns
>>>>>>>>>>> a
>>>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>>>> with
>>>>>>>>>>> 0 messages?
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>
>>>>
>>>>
>>>> --
>>>>    Claudio Martella
>>>>    claudio.martella@gmail.com
>>
>>
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

it doesn't have to be expand, k, the number of elements returned by
the combiner, can still be smaller than n, the size of the messages
parameter. as a first example, you can imagine your vertex receiving
semantically-different classes/types of messages, and you can imagine
willing to be summarizing them in different messages, i.e. if your
messages come along with labels or just simply by the source vertex,
if required by the algorithm, think of label propagation to have just
an example, or some sort of labeled-pagerank.

On Tue, Jan 10, 2012 at 3:05 AM, Avery Ching <ac...@apache.org> wrote:
> I agree that C&A doesn't require it, however, I can't think of why I would
> want to use a combiner to expand the number of messages.  Can you?
>
> Avery
>
>
> On 1/9/12 3:57 PM, Jakob Homan wrote:
>>>
>>> In my opinion that means reducing to a single message or none at all.
>>
>> C&A doesn't require this, however.  Hadoop's combiner interface, for
>> instance, doesn't require a single  or no value to be returned; it has
>> the same interface as a reducer, zero or more values.  Would adapting
>> the semantics of Giraph's combiner to return a list of messages
>> (possibly empty) make it more useful?
>>
>> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
>> <cl...@gmail.com>  wrote:
>>>
>>> Yes, what is you say is completely reasonable, you convinced me :)
>>>
>>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>  wrote:
>>>>
>>>> Combiners should be commutative and associative.  In my opinion that
>>>> means
>>>> reducing to a single message or none at all.  Can you think of a case
>>>> when
>>>> more than 1 message should be returned from a combiner?  I know that
>>>> returning null isn't preferable in general, but I think that
>>>> functionality
>>>> (returning no messages), is nice to have and isn't a huge amount of work
>>>> on
>>>> our side.
>>>>
>>>> Avery
>>>>
>>>>
>>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>>>
>>>>> To clarify, I was not discussing the possibility for combine to return
>>>>> null. I see why it would be useful, given that combine returns M,
>>>>> there's no other way to let combiner ask not to send any message,
>>>>> although i agree with Jakob, I also believe returning null should be
>>>>> avoided but only used, roughly, as an init value for a
>>>>> reference/pointer.
>>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>>> but maybe it's too much.
>>>>>
>>>>> As far as the size of the messages parameter, I agree with you that 0
>>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>>> was more a matter of synching it with the other methods getting the
>>>>> messages parameter.
>>>>> Probably, having a more clear javadoc will do the job here.
>>>>>
>>>>> What do you think?
>>>>>
>>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>
>>>>>  wrote:
>>>>>>
>>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>>> calling code (null checks, or not, since people usually will forget
>>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>>> it conceivable that one would want to write a combiner that returned
>>>>>> something for an input of no parameters, ie combining the empty list
>>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>>> combining a single message would result in that message.
>>>>>>
>>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>
>>>>>>  wrote:
>>>>>>>
>>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>>
>>>>>>>  /**
>>>>>>>   * Combines message values for a particular vertex index.
>>>>>>>   *
>>>>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>>>>   * @param msgList List of the messages to be combined
>>>>>>>   * @return Message that is combined from {@link MsgList} or null if
>>>>>>> no
>>>>>>>   *         message it to be sent
>>>>>>>   * @throws IOException
>>>>>>>   */
>>>>>>>
>>>>>>> I think we are somewhat vague on what a combiner can return to
>>>>>>> support
>>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>>> compute() algorithm.  I think it should be legal to return null from
>>>>>>> a
>>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>>
>>>>>>> It seems like it would be an overhead to call a combiner when there
>>>>>>> are
>>>>>>> 0
>>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>>> should
>>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>>> message
>>>>>>> to have combine() being called.
>>>>>>>
>>>>>>> Avery
>>>>>>>
>>>>>>>
>>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>>>
>>>>>>>> Hi Sebastian,
>>>>>>>>
>>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>>> define explicitly the semantics of this scenario.
>>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>>> messages, as it's the case of BasicVertex::initialize(),
>>>>>>>> putMessages()
>>>>>>>> and compute(), and act accordingly.
>>>>>>>>
>>>>>>>> In the particular example, I believe the SimpleSumCombiner is
>>>>>>>> bugged.
>>>>>>>> It's true that the sum of no values is 0, but it's also true that
>>>>>>>> the
>>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>>> situation.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>>> only
>>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>>> vertex
>>>>>>>>> has been sent.
>>>>>>>>>
>>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>>
>>>>>>>>> --sebastian
>>>>>>>>>
>>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>>>
>>>>>>>>>> Hello list,
>>>>>>>>>>
>>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>>
>>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>>
>>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>>> vertex
>>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>>
>>>>>>>>>>     public Iterable<M>      getMessages(I vertexId) {
>>>>>>>>>>       Iterable<M>      messages =
>>>>>>>>>> inMessages.getMessages(vertexId);
>>>>>>>>>>       if (combiner != null) {
>>>>>>>>>>               M combinedMsg;
>>>>>>>>>>               try {
>>>>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>>>>> messages);
>>>>>>>>>>               }  catch (IOException e) {
>>>>>>>>>>                       throw new RuntimeException("could not
>>>>>>>>>> combine",
>>>>>>>>>> e);
>>>>>>>>>>               }
>>>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>>>                       List<M>      tmp = new ArrayList<M>(1);
>>>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>>>                       messages = tmp;
>>>>>>>>>>               } else {
>>>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>>>               }
>>>>>>>>>>       }
>>>>>>>>>>       return messages;
>>>>>>>>>>     }
>>>>>>>>>>
>>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns
>>>>>>>>>> a
>>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>>> with
>>>>>>>>>> 0 messages?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>
>>>
>>> --
>>>    Claudio Martella
>>>    claudio.martella@gmail.com
>
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Avery Ching <ac...@apache.org>.

I agree that C&A doesn't require it, however, I can't think of why I 
would want to use a combiner to expand the number of messages.  Can you?

Avery

On 1/9/12 3:57 PM, Jakob Homan wrote:
>> In my opinion that means reducing to a single message or none at all.
> C&A doesn't require this, however.  Hadoop's combiner interface, for
> instance, doesn't require a single  or no value to be returned; it has
> the same interface as a reducer, zero or more values.  Would adapting
> the semantics of Giraph's combiner to return a list of messages
> (possibly empty) make it more useful?
>
> On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
> <cl...@gmail.com>  wrote:
>> Yes, what is you say is completely reasonable, you convinced me :)
>>
>> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching<ac...@apache.org>  wrote:
>>> Combiners should be commutative and associative.  In my opinion that means
>>> reducing to a single message or none at all.  Can you think of a case when
>>> more than 1 message should be returned from a combiner?  I know that
>>> returning null isn't preferable in general, but I think that functionality
>>> (returning no messages), is nice to have and isn't a huge amount of work on
>>> our side.
>>>
>>> Avery
>>>
>>>
>>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>> To clarify, I was not discussing the possibility for combine to return
>>>> null. I see why it would be useful, given that combine returns M,
>>>> there's no other way to let combiner ask not to send any message,
>>>> although i agree with Jakob, I also believe returning null should be
>>>> avoided but only used, roughly, as an init value for a
>>>> reference/pointer.
>>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>>> return Iterable<M>, basicallly letting it define what to combine to
>>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>>> but maybe it's too much.
>>>>
>>>> As far as the size of the messages parameter, I agree with you that 0
>>>> messages gives nothing to combine and it would be somehow awkward, it
>>>> was more a matter of synching it with the other methods getting the
>>>> messages parameter.
>>>> Probably, having a more clear javadoc will do the job here.
>>>>
>>>> What do you think?
>>>>
>>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>    wrote:
>>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>>> calling code (null checks, or not, since people usually will forget
>>>>> them).  Avery is correct that combiners are application specific.  Is
>>>>> it conceivable that one would want to write a combiner that returned
>>>>> something for an input of no parameters, ie combining the empty list
>>>>> doesn't return the empty list?  I imagine for most combiners,
>>>>> combining a single message would result in that message.
>>>>>
>>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>    wrote:
>>>>>> The javadoc for VertexCombiner#combine() is
>>>>>>
>>>>>>   /**
>>>>>>    * Combines message values for a particular vertex index.
>>>>>>    *
>>>>>>    * @param vertexIndex Index of the vertex getting these messages
>>>>>>    * @param msgList List of the messages to be combined
>>>>>>    * @return Message that is combined from {@link MsgList} or null if no
>>>>>>    *         message it to be sent
>>>>>>    * @throws IOException
>>>>>>    */
>>>>>>
>>>>>> I think we are somewhat vague on what a combiner can return to support
>>>>>> various use cases.  A combiner should be particular to a particular
>>>>>> compute() algorithm.  I think it should be legal to return null from a
>>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>>
>>>>>> It seems like it would be an overhead to call a combiner when there are
>>>>>> 0
>>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>>> should
>>>>>> change the javadoc to insure that msgList must contain at least one
>>>>>> message
>>>>>> to have combine() being called.
>>>>>>
>>>>>> Avery
>>>>>>
>>>>>>
>>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>> Hi Sebastian,
>>>>>>>
>>>>>>> yes, that was my point, I agree completely with you.
>>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>>> define explicitly the semantics of this scenario.
>>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>>>>>>> and compute(), and act accordingly.
>>>>>>>
>>>>>>> In the particular example, I believe the SimpleSumCombiner is bugged.
>>>>>>> It's true that the sum of no values is 0, but it's also true that the
>>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>>> situation.
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>>   wrote:
>>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>>> only
>>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>>> vertex
>>>>>>>> has been sent.
>>>>>>>>
>>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>>
>>>>>>>> --sebastian
>>>>>>>>
>>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>> Hello list,
>>>>>>>>>
>>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>>
>>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>>
>>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>>> vertex
>>>>>>>>> an Iterable, which can be empty:
>>>>>>>>>
>>>>>>>>>      public Iterable<M>      getMessages(I vertexId) {
>>>>>>>>>        Iterable<M>      messages = inMessages.getMessages(vertexId);
>>>>>>>>>        if (combiner != null) {
>>>>>>>>>                M combinedMsg;
>>>>>>>>>                try {
>>>>>>>>>                        combinedMsg = combiner.combine(vertexId,
>>>>>>>>> messages);
>>>>>>>>>                }  catch (IOException e) {
>>>>>>>>>                        throw new RuntimeException("could not combine",
>>>>>>>>> e);
>>>>>>>>>                }
>>>>>>>>>                if (combinedMsg != null) {
>>>>>>>>>                        List<M>      tmp = new ArrayList<M>(1);
>>>>>>>>>                        tmp.add(combinedMsg);
>>>>>>>>>                        messages = tmp;
>>>>>>>>>                } else {
>>>>>>>>>                        messages = new ArrayList<M>(0);
>>>>>>>>>                }
>>>>>>>>>        }
>>>>>>>>>        return messages;
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>>> Now, the question is: who's wrong? The combiner code that returns a
>>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>>> with
>>>>>>>>> 0 messages?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>
>>
>>
>> --
>>     Claudio Martella
>>     claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Jakob Homan <jg...@gmail.com>.

> In my opinion that means reducing to a single message or none at all.
C&A doesn't require this, however.  Hadoop's combiner interface, for
instance, doesn't require a single  or no value to be returned; it has
the same interface as a reducer, zero or more values.  Would adapting
the semantics of Giraph's combiner to return a list of messages
(possibly empty) make it more useful?

On Mon, Jan 9, 2012 at 3:21 PM, Claudio Martella
<cl...@gmail.com> wrote:
> Yes, what is you say is completely reasonable, you convinced me :)
>
> On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching <ac...@apache.org> wrote:
>> Combiners should be commutative and associative.  In my opinion that means
>> reducing to a single message or none at all.  Can you think of a case when
>> more than 1 message should be returned from a combiner?  I know that
>> returning null isn't preferable in general, but I think that functionality
>> (returning no messages), is nice to have and isn't a huge amount of work on
>> our side.
>>
>> Avery
>>
>>
>> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>>
>>> To clarify, I was not discussing the possibility for combine to return
>>> null. I see why it would be useful, given that combine returns M,
>>> there's no other way to let combiner ask not to send any message,
>>> although i agree with Jakob, I also believe returning null should be
>>> avoided but only used, roughly, as an init value for a
>>> reference/pointer.
>>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>>> return Iterable<M>, basicallly letting it define what to combine to
>>> ({0, 1, k } messages). It would be a powerful extension to the model,
>>> but maybe it's too much.
>>>
>>> As far as the size of the messages parameter, I agree with you that 0
>>> messages gives nothing to combine and it would be somehow awkward, it
>>> was more a matter of synching it with the other methods getting the
>>> messages parameter.
>>> Probably, having a more clear javadoc will do the job here.
>>>
>>> What do you think?
>>>
>>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>>
>>>> I'm not a big fan of returning null as it adds extra complexity to the
>>>> calling code (null checks, or not, since people usually will forget
>>>> them).  Avery is correct that combiners are application specific.  Is
>>>> it conceivable that one would want to write a combiner that returned
>>>> something for an input of no parameters, ie combining the empty list
>>>> doesn't return the empty list?  I imagine for most combiners,
>>>> combining a single message would result in that message.
>>>>
>>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>  wrote:
>>>>>
>>>>> The javadoc for VertexCombiner#combine() is
>>>>>
>>>>>  /**
>>>>>   * Combines message values for a particular vertex index.
>>>>>   *
>>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>>   * @param msgList List of the messages to be combined
>>>>>   * @return Message that is combined from {@link MsgList} or null if no
>>>>>   *         message it to be sent
>>>>>   * @throws IOException
>>>>>   */
>>>>>
>>>>> I think we are somewhat vague on what a combiner can return to support
>>>>> various use cases.  A combiner should be particular to a particular
>>>>> compute() algorithm.  I think it should be legal to return null from a
>>>>> combiner, in that case, no message should be sent to that vertex.
>>>>>
>>>>> It seems like it would be an overhead to call a combiner when there are
>>>>> 0
>>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>>> should
>>>>> change the javadoc to insure that msgList must contain at least one
>>>>> message
>>>>> to have combine() being called.
>>>>>
>>>>> Avery
>>>>>
>>>>>
>>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>>
>>>>>> Hi Sebastian,
>>>>>>
>>>>>> yes, that was my point, I agree completely with you.
>>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>>> define explicitly the semantics of this scenario.
>>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>>>>>> and compute(), and act accordingly.
>>>>>>
>>>>>> In the particular example, I believe the SimpleSumCombiner is bugged.
>>>>>> It's true that the sum of no values is 0, but it's also true that the
>>>>>> null return semantics of combine() is more suitable for this exact
>>>>>> situation.
>>>>>>
>>>>>>
>>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>>  wrote:
>>>>>>>
>>>>>>> I think we currently implicitly assume that there is at least one
>>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>>> only
>>>>>>> invokes the combiner only if at least one message for the target
>>>>>>> vertex
>>>>>>> has been sent.
>>>>>>>
>>>>>>> However, we should not rely on implicit implementation details but
>>>>>>> explicitly specify the semantics of combiners.
>>>>>>>
>>>>>>> --sebastian
>>>>>>>
>>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>>
>>>>>>>> Hello list,
>>>>>>>>
>>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>>> interesting problem with the combiner semantics.
>>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>>
>>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>>
>>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>>> vertex
>>>>>>>> an Iterable, which can be empty:
>>>>>>>>
>>>>>>>>     public Iterable<M>    getMessages(I vertexId) {
>>>>>>>>       Iterable<M>    messages = inMessages.getMessages(vertexId);
>>>>>>>>       if (combiner != null) {
>>>>>>>>               M combinedMsg;
>>>>>>>>               try {
>>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>>> messages);
>>>>>>>>               }  catch (IOException e) {
>>>>>>>>                       throw new RuntimeException("could not combine",
>>>>>>>> e);
>>>>>>>>               }
>>>>>>>>               if (combinedMsg != null) {
>>>>>>>>                       List<M>    tmp = new ArrayList<M>(1);
>>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>>                       messages = tmp;
>>>>>>>>               } else {
>>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>>               }
>>>>>>>>       }
>>>>>>>>       return messages;
>>>>>>>>     }
>>>>>>>>
>>>>>>>> the Iterable returned by this methods is passed to
>>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>>> Now, the question is: who's wrong? The combiner code that returns a
>>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>>> with
>>>>>>>> 0 messages?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>
>>>
>>>
>>
>
>
>
> --
>    Claudio Martella
>    claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

Yes, what is you say is completely reasonable, you convinced me :)

On Mon, Jan 9, 2012 at 11:28 PM, Avery Ching <ac...@apache.org> wrote:
> Combiners should be commutative and associative.  In my opinion that means
> reducing to a single message or none at all.  Can you think of a case when
> more than 1 message should be returned from a combiner?  I know that
> returning null isn't preferable in general, but I think that functionality
> (returning no messages), is nice to have and isn't a huge amount of work on
> our side.
>
> Avery
>
>
> On 1/9/12 12:13 PM, Claudio Martella wrote:
>>
>> To clarify, I was not discussing the possibility for combine to return
>> null. I see why it would be useful, given that combine returns M,
>> there's no other way to let combiner ask not to send any message,
>> although i agree with Jakob, I also believe returning null should be
>> avoided but only used, roughly, as an init value for a
>> reference/pointer.
>> Perhaps, we could, but i'm just thinking out loud here, let combine()
>> return Iterable<M>, basicallly letting it define what to combine to
>> ({0, 1, k } messages). It would be a powerful extension to the model,
>> but maybe it's too much.
>>
>> As far as the size of the messages parameter, I agree with you that 0
>> messages gives nothing to combine and it would be somehow awkward, it
>> was more a matter of synching it with the other methods getting the
>> messages parameter.
>> Probably, having a more clear javadoc will do the job here.
>>
>> What do you think?
>>
>> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>  wrote:
>>>
>>> I'm not a big fan of returning null as it adds extra complexity to the
>>> calling code (null checks, or not, since people usually will forget
>>> them).  Avery is correct that combiners are application specific.  Is
>>> it conceivable that one would want to write a combiner that returned
>>> something for an input of no parameters, ie combining the empty list
>>> doesn't return the empty list?  I imagine for most combiners,
>>> combining a single message would result in that message.
>>>
>>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>  wrote:
>>>>
>>>> The javadoc for VertexCombiner#combine() is
>>>>
>>>>  /**
>>>>   * Combines message values for a particular vertex index.
>>>>   *
>>>>   * @param vertexIndex Index of the vertex getting these messages
>>>>   * @param msgList List of the messages to be combined
>>>>   * @return Message that is combined from {@link MsgList} or null if no
>>>>   *         message it to be sent
>>>>   * @throws IOException
>>>>   */
>>>>
>>>> I think we are somewhat vague on what a combiner can return to support
>>>> various use cases.  A combiner should be particular to a particular
>>>> compute() algorithm.  I think it should be legal to return null from a
>>>> combiner, in that case, no message should be sent to that vertex.
>>>>
>>>> It seems like it would be an overhead to call a combiner when there are
>>>> 0
>>>> messages.  I can't see a case where that would be useful.  Perhaps we
>>>> should
>>>> change the javadoc to insure that msgList must contain at least one
>>>> message
>>>> to have combine() being called.
>>>>
>>>> Avery
>>>>
>>>>
>>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>>>
>>>>> Hi Sebastian,
>>>>>
>>>>> yes, that was my point, I agree completely with you.
>>>>> Fixing my test was not the issue, my question was whether we want to
>>>>> define explicitly the semantics of this scenario.
>>>>> Personally, I believe the combiner should be ready to receive 0
>>>>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>>>>> and compute(), and act accordingly.
>>>>>
>>>>> In the particular example, I believe the SimpleSumCombiner is bugged.
>>>>> It's true that the sum of no values is 0, but it's also true that the
>>>>> null return semantics of combine() is more suitable for this exact
>>>>> situation.
>>>>>
>>>>>
>>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>
>>>>>  wrote:
>>>>>>
>>>>>> I think we currently implicitly assume that there is at least one
>>>>>> element in the Iterable passed to the combiner. The messaging code
>>>>>> only
>>>>>> invokes the combiner only if at least one message for the target
>>>>>> vertex
>>>>>> has been sent.
>>>>>>
>>>>>> However, we should not rely on implicit implementation details but
>>>>>> explicitly specify the semantics of combiners.
>>>>>>
>>>>>> --sebastian
>>>>>>
>>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>>>
>>>>>>> Hello list,
>>>>>>>
>>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>>> interesting problem with the combiner semantics.
>>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>>
>>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>>> reason the vertices get activated at each superstep.
>>>>>>>
>>>>>>> At each superstep, under-the-hood, I pass the combiner for each
>>>>>>> vertex
>>>>>>> an Iterable, which can be empty:
>>>>>>>
>>>>>>>     public Iterable<M>    getMessages(I vertexId) {
>>>>>>>       Iterable<M>    messages = inMessages.getMessages(vertexId);
>>>>>>>       if (combiner != null) {
>>>>>>>               M combinedMsg;
>>>>>>>               try {
>>>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>>>> messages);
>>>>>>>               }  catch (IOException e) {
>>>>>>>                       throw new RuntimeException("could not combine",
>>>>>>> e);
>>>>>>>               }
>>>>>>>               if (combinedMsg != null) {
>>>>>>>                       List<M>    tmp = new ArrayList<M>(1);
>>>>>>>                       tmp.add(combinedMsg);
>>>>>>>                       messages = tmp;
>>>>>>>               } else {
>>>>>>>                       messages = new ArrayList<M>(0);
>>>>>>>               }
>>>>>>>       }
>>>>>>>       return messages;
>>>>>>>     }
>>>>>>>
>>>>>>> the Iterable returned by this methods is passed to
>>>>>>> basicVertex.putMessages() right before the compute().
>>>>>>> Now, the question is: who's wrong? The combiner code that returns a
>>>>>>> sum of 0 over no values, or the framework that calls the combiner
>>>>>>> with
>>>>>>> 0 messages?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>
>>
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Avery Ching <ac...@apache.org>.

Combiners should be commutative and associative.  In my opinion that 
means reducing to a single message or none at all.  Can you think of a 
case when more than 1 message should be returned from a combiner?  I 
know that returning null isn't preferable in general, but I think that 
functionality (returning no messages), is nice to have and isn't a huge 
amount of work on our side.

Avery

On 1/9/12 12:13 PM, Claudio Martella wrote:
> To clarify, I was not discussing the possibility for combine to return
> null. I see why it would be useful, given that combine returns M,
> there's no other way to let combiner ask not to send any message,
> although i agree with Jakob, I also believe returning null should be
> avoided but only used, roughly, as an init value for a
> reference/pointer.
> Perhaps, we could, but i'm just thinking out loud here, let combine()
> return Iterable<M>, basicallly letting it define what to combine to
> ({0, 1, k } messages). It would be a powerful extension to the model,
> but maybe it's too much.
>
> As far as the size of the messages parameter, I agree with you that 0
> messages gives nothing to combine and it would be somehow awkward, it
> was more a matter of synching it with the other methods getting the
> messages parameter.
> Probably, having a more clear javadoc will do the job here.
>
> What do you think?
>
> On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan<jg...@gmail.com>  wrote:
>> I'm not a big fan of returning null as it adds extra complexity to the
>> calling code (null checks, or not, since people usually will forget
>> them).  Avery is correct that combiners are application specific.  Is
>> it conceivable that one would want to write a combiner that returned
>> something for an input of no parameters, ie combining the empty list
>> doesn't return the empty list?  I imagine for most combiners,
>> combining a single message would result in that message.
>>
>> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching<ac...@apache.org>  wrote:
>>> The javadoc for VertexCombiner#combine() is
>>>
>>>   /**
>>>    * Combines message values for a particular vertex index.
>>>    *
>>>    * @param vertexIndex Index of the vertex getting these messages
>>>    * @param msgList List of the messages to be combined
>>>    * @return Message that is combined from {@link MsgList} or null if no
>>>    *         message it to be sent
>>>    * @throws IOException
>>>    */
>>>
>>> I think we are somewhat vague on what a combiner can return to support
>>> various use cases.  A combiner should be particular to a particular
>>> compute() algorithm.  I think it should be legal to return null from a
>>> combiner, in that case, no message should be sent to that vertex.
>>>
>>> It seems like it would be an overhead to call a combiner when there are 0
>>> messages.  I can't see a case where that would be useful.  Perhaps we should
>>> change the javadoc to insure that msgList must contain at least one message
>>> to have combine() being called.
>>>
>>> Avery
>>>
>>>
>>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>> Hi Sebastian,
>>>>
>>>> yes, that was my point, I agree completely with you.
>>>> Fixing my test was not the issue, my question was whether we want to
>>>> define explicitly the semantics of this scenario.
>>>> Personally, I believe the combiner should be ready to receive 0
>>>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>>>> and compute(), and act accordingly.
>>>>
>>>> In the particular example, I believe the SimpleSumCombiner is bugged.
>>>> It's true that the sum of no values is 0, but it's also true that the
>>>> null return semantics of combine() is more suitable for this exact
>>>> situation.
>>>>
>>>>
>>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>    wrote:
>>>>> I think we currently implicitly assume that there is at least one
>>>>> element in the Iterable passed to the combiner. The messaging code only
>>>>> invokes the combiner only if at least one message for the target vertex
>>>>> has been sent.
>>>>>
>>>>> However, we should not rely on implicit implementation details but
>>>>> explicitly specify the semantics of combiners.
>>>>>
>>>>> --sebastian
>>>>>
>>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>> Hello list,
>>>>>>
>>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>>> interesting problem with the combiner semantics.
>>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>>
>>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>>> reason the vertices get activated at each superstep.
>>>>>>
>>>>>> At each superstep, under-the-hood, I pass the combiner for each vertex
>>>>>> an Iterable, which can be empty:
>>>>>>
>>>>>>      public Iterable<M>    getMessages(I vertexId) {
>>>>>>        Iterable<M>    messages = inMessages.getMessages(vertexId);
>>>>>>        if (combiner != null) {
>>>>>>                M combinedMsg;
>>>>>>                try {
>>>>>>                        combinedMsg = combiner.combine(vertexId,
>>>>>> messages);
>>>>>>                }  catch (IOException e) {
>>>>>>                        throw new RuntimeException("could not combine",
>>>>>> e);
>>>>>>                }
>>>>>>                if (combinedMsg != null) {
>>>>>>                        List<M>    tmp = new ArrayList<M>(1);
>>>>>>                        tmp.add(combinedMsg);
>>>>>>                        messages = tmp;
>>>>>>                } else {
>>>>>>                        messages = new ArrayList<M>(0);
>>>>>>                }
>>>>>>        }
>>>>>>        return messages;
>>>>>>      }
>>>>>>
>>>>>> the Iterable returned by this methods is passed to
>>>>>> basicVertex.putMessages() right before the compute().
>>>>>> Now, the question is: who's wrong? The combiner code that returns a
>>>>>> sum of 0 over no values, or the framework that calls the combiner with
>>>>>> 0 messages?
>>>>>>
>>>>>>
>>>>>>
>>>>
>
>

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

To clarify, I was not discussing the possibility for combine to return
null. I see why it would be useful, given that combine returns M,
there's no other way to let combiner ask not to send any message,
although i agree with Jakob, I also believe returning null should be
avoided but only used, roughly, as an init value for a
reference/pointer.
Perhaps, we could, but i'm just thinking out loud here, let combine()
return Iterable<M>, basicallly letting it define what to combine to
({0, 1, k } messages). It would be a powerful extension to the model,
but maybe it's too much.

As far as the size of the messages parameter, I agree with you that 0
messages gives nothing to combine and it would be somehow awkward, it
was more a matter of synching it with the other methods getting the
messages parameter.
Probably, having a more clear javadoc will do the job here.

What do you think?

On Mon, Jan 9, 2012 at 8:42 PM, Jakob Homan <jg...@gmail.com> wrote:
> I'm not a big fan of returning null as it adds extra complexity to the
> calling code (null checks, or not, since people usually will forget
> them).  Avery is correct that combiners are application specific.  Is
> it conceivable that one would want to write a combiner that returned
> something for an input of no parameters, ie combining the empty list
> doesn't return the empty list?  I imagine for most combiners,
> combining a single message would result in that message.
>
> On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching <ac...@apache.org> wrote:
>> The javadoc for VertexCombiner#combine() is
>>
>>  /**
>>   * Combines message values for a particular vertex index.
>>   *
>>   * @param vertexIndex Index of the vertex getting these messages
>>   * @param msgList List of the messages to be combined
>>   * @return Message that is combined from {@link MsgList} or null if no
>>   *         message it to be sent
>>   * @throws IOException
>>   */
>>
>> I think we are somewhat vague on what a combiner can return to support
>> various use cases.  A combiner should be particular to a particular
>> compute() algorithm.  I think it should be legal to return null from a
>> combiner, in that case, no message should be sent to that vertex.
>>
>> It seems like it would be an overhead to call a combiner when there are 0
>> messages.  I can't see a case where that would be useful.  Perhaps we should
>> change the javadoc to insure that msgList must contain at least one message
>> to have combine() being called.
>>
>> Avery
>>
>>
>> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>>
>>> Hi Sebastian,
>>>
>>> yes, that was my point, I agree completely with you.
>>> Fixing my test was not the issue, my question was whether we want to
>>> define explicitly the semantics of this scenario.
>>> Personally, I believe the combiner should be ready to receive 0
>>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>>> and compute(), and act accordingly.
>>>
>>> In the particular example, I believe the SimpleSumCombiner is bugged.
>>> It's true that the sum of no values is 0, but it's also true that the
>>> null return semantics of combine() is more suitable for this exact
>>> situation.
>>>
>>>
>>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>  wrote:
>>>>
>>>> I think we currently implicitly assume that there is at least one
>>>> element in the Iterable passed to the combiner. The messaging code only
>>>> invokes the combiner only if at least one message for the target vertex
>>>> has been sent.
>>>>
>>>> However, we should not rely on implicit implementation details but
>>>> explicitly specify the semantics of combiners.
>>>>
>>>> --sebastian
>>>>
>>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>>
>>>>> Hello list,
>>>>>
>>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>>> interesting problem with the combiner semantics.
>>>>> currently, my code fails testBspCombiner for the following reason:
>>>>>
>>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>>> messages in the iterator (in this case it returns 0) and for this
>>>>> reason the vertices get activated at each superstep.
>>>>>
>>>>> At each superstep, under-the-hood, I pass the combiner for each vertex
>>>>> an Iterable, which can be empty:
>>>>>
>>>>>     public Iterable<M>  getMessages(I vertexId) {
>>>>>       Iterable<M>  messages = inMessages.getMessages(vertexId);
>>>>>       if (combiner != null) {
>>>>>               M combinedMsg;
>>>>>               try {
>>>>>                       combinedMsg = combiner.combine(vertexId,
>>>>> messages);
>>>>>               }  catch (IOException e) {
>>>>>                       throw new RuntimeException("could not combine",
>>>>> e);
>>>>>               }
>>>>>               if (combinedMsg != null) {
>>>>>                       List<M>  tmp = new ArrayList<M>(1);
>>>>>                       tmp.add(combinedMsg);
>>>>>                       messages = tmp;
>>>>>               } else {
>>>>>                       messages = new ArrayList<M>(0);
>>>>>               }
>>>>>       }
>>>>>       return messages;
>>>>>     }
>>>>>
>>>>> the Iterable returned by this methods is passed to
>>>>> basicVertex.putMessages() right before the compute().
>>>>> Now, the question is: who's wrong? The combiner code that returns a
>>>>> sum of 0 over no values, or the framework that calls the combiner with
>>>>> 0 messages?
>>>>>
>>>>>
>>>>>
>>>
>>>
>>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Jakob Homan <jg...@gmail.com>.

I'm not a big fan of returning null as it adds extra complexity to the
calling code (null checks, or not, since people usually will forget
them).  Avery is correct that combiners are application specific.  Is
it conceivable that one would want to write a combiner that returned
something for an input of no parameters, ie combining the empty list
doesn't return the empty list?  I imagine for most combiners,
combining a single message would result in that message.

On Mon, Jan 9, 2012 at 11:28 AM, Avery Ching <ac...@apache.org> wrote:
> The javadoc for VertexCombiner#combine() is
>
>  /**
>   * Combines message values for a particular vertex index.
>   *
>   * @param vertexIndex Index of the vertex getting these messages
>   * @param msgList List of the messages to be combined
>   * @return Message that is combined from {@link MsgList} or null if no
>   *         message it to be sent
>   * @throws IOException
>   */
>
> I think we are somewhat vague on what a combiner can return to support
> various use cases.  A combiner should be particular to a particular
> compute() algorithm.  I think it should be legal to return null from a
> combiner, in that case, no message should be sent to that vertex.
>
> It seems like it would be an overhead to call a combiner when there are 0
> messages.  I can't see a case where that would be useful.  Perhaps we should
> change the javadoc to insure that msgList must contain at least one message
> to have combine() being called.
>
> Avery
>
>
> On 1/9/12 5:37 AM, Claudio Martella wrote:
>>
>> Hi Sebastian,
>>
>> yes, that was my point, I agree completely with you.
>> Fixing my test was not the issue, my question was whether we want to
>> define explicitly the semantics of this scenario.
>> Personally, I believe the combiner should be ready to receive 0
>> messages, as it's the case of BasicVertex::initialize(), putMessages()
>> and compute(), and act accordingly.
>>
>> In the particular example, I believe the SimpleSumCombiner is bugged.
>> It's true that the sum of no values is 0, but it's also true that the
>> null return semantics of combine() is more suitable for this exact
>> situation.
>>
>>
>> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>  wrote:
>>>
>>> I think we currently implicitly assume that there is at least one
>>> element in the Iterable passed to the combiner. The messaging code only
>>> invokes the combiner only if at least one message for the target vertex
>>> has been sent.
>>>
>>> However, we should not rely on implicit implementation details but
>>> explicitly specify the semantics of combiners.
>>>
>>> --sebastian
>>>
>>> On 09.01.2012 13:29, Claudio Martella wrote:
>>>>
>>>> Hello list,
>>>>
>>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>>> interesting problem with the combiner semantics.
>>>> currently, my code fails testBspCombiner for the following reason:
>>>>
>>>> SimpleSumCombiner::compute() returns a value even if there are no
>>>> messages in the iterator (in this case it returns 0) and for this
>>>> reason the vertices get activated at each superstep.
>>>>
>>>> At each superstep, under-the-hood, I pass the combiner for each vertex
>>>> an Iterable, which can be empty:
>>>>
>>>>     public Iterable<M>  getMessages(I vertexId) {
>>>>       Iterable<M>  messages = inMessages.getMessages(vertexId);
>>>>       if (combiner != null) {
>>>>               M combinedMsg;
>>>>               try {
>>>>                       combinedMsg = combiner.combine(vertexId,
>>>> messages);
>>>>               }  catch (IOException e) {
>>>>                       throw new RuntimeException("could not combine",
>>>> e);
>>>>               }
>>>>               if (combinedMsg != null) {
>>>>                       List<M>  tmp = new ArrayList<M>(1);
>>>>                       tmp.add(combinedMsg);
>>>>                       messages = tmp;
>>>>               } else {
>>>>                       messages = new ArrayList<M>(0);
>>>>               }
>>>>       }
>>>>       return messages;
>>>>     }
>>>>
>>>> the Iterable returned by this methods is passed to
>>>> basicVertex.putMessages() right before the compute().
>>>> Now, the question is: who's wrong? The combiner code that returns a
>>>> sum of 0 over no values, or the framework that calls the combiner with
>>>> 0 messages?
>>>>
>>>>
>>>>
>>
>>
>

Re: on the semantics of the combiner

Posted by Avery Ching <ac...@apache.org>.

The javadoc for VertexCombiner#combine() is

   /**
    * Combines message values for a particular vertex index.
    *
    * @param vertexIndex Index of the vertex getting these messages
    * @param msgList List of the messages to be combined
    * @return Message that is combined from {@link MsgList} or null if no
    *         message it to be sent
    * @throws IOException
    */

I think we are somewhat vague on what a combiner can return to support 
various use cases.  A combiner should be particular to a particular 
compute() algorithm.  I think it should be legal to return null from a 
combiner, in that case, no message should be sent to that vertex.

It seems like it would be an overhead to call a combiner when there are 
0 messages.  I can't see a case where that would be useful.  Perhaps we 
should change the javadoc to insure that msgList must contain at least 
one message to have combine() being called.

Avery

On 1/9/12 5:37 AM, Claudio Martella wrote:
> Hi Sebastian,
>
> yes, that was my point, I agree completely with you.
> Fixing my test was not the issue, my question was whether we want to
> define explicitly the semantics of this scenario.
> Personally, I believe the combiner should be ready to receive 0
> messages, as it's the case of BasicVertex::initialize(), putMessages()
> and compute(), and act accordingly.
>
> In the particular example, I believe the SimpleSumCombiner is bugged.
> It's true that the sum of no values is 0, but it's also true that the
> null return semantics of combine() is more suitable for this exact
> situation.
>
>
> On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter<ss...@apache.org>  wrote:
>> I think we currently implicitly assume that there is at least one
>> element in the Iterable passed to the combiner. The messaging code only
>> invokes the combiner only if at least one message for the target vertex
>> has been sent.
>>
>> However, we should not rely on implicit implementation details but
>> explicitly specify the semantics of combiners.
>>
>> --sebastian
>>
>> On 09.01.2012 13:29, Claudio Martella wrote:
>>> Hello list,
>>>
>>> for GIRAPH-45 I'm touching the incoming messages and hit an
>>> interesting problem with the combiner semantics.
>>> currently, my code fails testBspCombiner for the following reason:
>>>
>>> SimpleSumCombiner::compute() returns a value even if there are no
>>> messages in the iterator (in this case it returns 0) and for this
>>> reason the vertices get activated at each superstep.
>>>
>>> At each superstep, under-the-hood, I pass the combiner for each vertex
>>> an Iterable, which can be empty:
>>>
>>>      public Iterable<M>  getMessages(I vertexId) {
>>>        Iterable<M>  messages = inMessages.getMessages(vertexId);
>>>        if (combiner != null) {
>>>                M combinedMsg;
>>>                try {
>>>                        combinedMsg = combiner.combine(vertexId, messages);
>>>                }  catch (IOException e) {
>>>                        throw new RuntimeException("could not combine", e);
>>>                }
>>>                if (combinedMsg != null) {
>>>                        List<M>  tmp = new ArrayList<M>(1);
>>>                        tmp.add(combinedMsg);
>>>                        messages = tmp;
>>>                } else {
>>>                        messages = new ArrayList<M>(0);
>>>                }
>>>        }
>>>        return messages;
>>>      }
>>>
>>> the Iterable returned by this methods is passed to
>>> basicVertex.putMessages() right before the compute().
>>> Now, the question is: who's wrong? The combiner code that returns a
>>> sum of 0 over no values, or the framework that calls the combiner with
>>> 0 messages?
>>>
>>>
>>>
>
>

Re: on the semantics of the combiner

Posted by Claudio Martella <cl...@gmail.com>.

Hi Sebastian,

yes, that was my point, I agree completely with you.
Fixing my test was not the issue, my question was whether we want to
define explicitly the semantics of this scenario.
Personally, I believe the combiner should be ready to receive 0
messages, as it's the case of BasicVertex::initialize(), putMessages()
and compute(), and act accordingly.

In the particular example, I believe the SimpleSumCombiner is bugged.
It's true that the sum of no values is 0, but it's also true that the
null return semantics of combine() is more suitable for this exact
situation.


On Mon, Jan 9, 2012 at 2:21 PM, Sebastian Schelter <ss...@apache.org> wrote:
> I think we currently implicitly assume that there is at least one
> element in the Iterable passed to the combiner. The messaging code only
> invokes the combiner only if at least one message for the target vertex
> has been sent.
>
> However, we should not rely on implicit implementation details but
> explicitly specify the semantics of combiners.
>
> --sebastian
>
> On 09.01.2012 13:29, Claudio Martella wrote:
>> Hello list,
>>
>> for GIRAPH-45 I'm touching the incoming messages and hit an
>> interesting problem with the combiner semantics.
>> currently, my code fails testBspCombiner for the following reason:
>>
>> SimpleSumCombiner::compute() returns a value even if there are no
>> messages in the iterator (in this case it returns 0) and for this
>> reason the vertices get activated at each superstep.
>>
>> At each superstep, under-the-hood, I pass the combiner for each vertex
>> an Iterable, which can be empty:
>>
>>     public Iterable<M> getMessages(I vertexId) {
>>       Iterable<M> messages = inMessages.getMessages(vertexId);
>>       if (combiner != null) {
>>               M combinedMsg;
>>               try {
>>                       combinedMsg = combiner.combine(vertexId, messages);
>>               }  catch (IOException e) {
>>                       throw new RuntimeException("could not combine", e);
>>               }
>>               if (combinedMsg != null) {
>>                       List<M> tmp = new ArrayList<M>(1);
>>                       tmp.add(combinedMsg);
>>                       messages = tmp;
>>               } else {
>>                       messages = new ArrayList<M>(0);
>>               }
>>       }
>>       return messages;
>>     }
>>
>> the Iterable returned by this methods is passed to
>> basicVertex.putMessages() right before the compute().
>> Now, the question is: who's wrong? The combiner code that returns a
>> sum of 0 over no values, or the framework that calls the combiner with
>> 0 messages?
>>
>>
>>
>



-- 
   Claudio Martella
   claudio.martella@gmail.com

Re: on the semantics of the combiner

Posted by Sebastian Schelter <ss...@apache.org>.

I think we currently implicitly assume that there is at least one
element in the Iterable passed to the combiner. The messaging code only
invokes the combiner only if at least one message for the target vertex
has been sent.

However, we should not rely on implicit implementation details but
explicitly specify the semantics of combiners.

--sebastian

On 09.01.2012 13:29, Claudio Martella wrote:
> Hello list,
> 
> for GIRAPH-45 I'm touching the incoming messages and hit an
> interesting problem with the combiner semantics.
> currently, my code fails testBspCombiner for the following reason:
> 
> SimpleSumCombiner::compute() returns a value even if there are no
> messages in the iterator (in this case it returns 0) and for this
> reason the vertices get activated at each superstep.
> 
> At each superstep, under-the-hood, I pass the combiner for each vertex
> an Iterable, which can be empty:
> 
>     public Iterable<M> getMessages(I vertexId) {
>     	Iterable<M> messages = inMessages.getMessages(vertexId);
>     	if (combiner != null) {
>     		M combinedMsg;
>     		try {
>     			combinedMsg = combiner.combine(vertexId, messages);
>     		}  catch (IOException e) {
>     			throw new RuntimeException("could not combine", e);
>     		}
>     		if (combinedMsg != null) {
>     			List<M> tmp = new ArrayList<M>(1);
>     			tmp.add(combinedMsg);
>     			messages = tmp;
>     		} else {
>     			messages = new ArrayList<M>(0);
>     		}
>     	}
>     	return messages;
>     }
> 
> the Iterable returned by this methods is passed to
> basicVertex.putMessages() right before the compute().
> Now, the question is: who's wrong? The combiner code that returns a
> sum of 0 over no values, or the framework that calls the combiner with
> 0 messages?
> 
> 
>