You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by Thomas Jungblut <th...@googlemail.com> on 2011/10/07 19:59:39 UTC

Combiners

Hey all,

I just have seen Avery's talk about Giraph. (I know you are reading
sometimes here, too. Thank you very much!)
It was quite enlightening to me on what features we should talk about.
He made some points on Combiners.

I wonder if we can integrate them in our framework, and not in the graph
layer that is upcoming.
Miklos implemented the MessageBundle, which stores messages for the same
class in a map.
Which is somewhat equal to Combiners, but not user defined.
I think we should let the user define their own Combiner! But I'm still not
very sure how.
Should we let him iterate over the messages via an iterable like the Reducer
in M/R?
For example, this could save a lot of memory if a user decides to merge
messages.
On the other hand, the user could have merge his messages right before he
sended them. So it would be an unused feature.

As you can see there are some things that should be clarified, do you have
an idea / opinion on that?

-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Combiners

Posted by "Edward J. Yoon" <ed...@apache.org>.
> I remember "Aggregators", wasn't this the min, max functions?

According to my understand, aggregator is another thing.

On Sat, Oct 8, 2011 at 2:30 PM, Thomas Jungblut
<th...@googlemail.com> wrote:
> Thanks Edward for your reply.
>
> It is a slightly different concept from Bundling messages. As
>> described Pregel paper, 'Combiner' function can be used to reduce size
>> of messages in cases when messages could be summarized arithmetically
>> e.g., min, max, sum, and average.
>>
>
> I remember "Aggregators", wasn't this the min, max functions?
>
> If we add this function to BSP programming model, BSP application code
>> can be concise and maintainable. Otherwise, user have to write own
>> codes in bsp() function.
>>
>
> Yes. That was my intention, too.
>
> We could use them in our graph examples (pagerank and SSSP) to combine the
> communication between the master task and the real tasks (the number of
> updated vertices / the difference between new and old rank).
> But I am still facing some design issues.
> E.g: when should the combiner be called and in which form. In MapReduce they
> are plain reducers, so we can have the same model but not grouped by a key
> rather than the class name of the message.
>
> We need a smart idea ;)
>
>
> 2011/10/8 Edward J. Yoon <ed...@apache.org>
>
>> It is a slightly different concept from Bundling messages. As
>> described Pregel paper, 'Combiner' function can be used to reduce size
>> of messages in cases when messages could be summarized arithmetically
>> e.g., min, max, sum, and average.
>>
>> If we add this function to BSP programming model, BSP application code
>> can be concise and maintainable. Otherwise, user have to write own
>> codes in bsp() function.
>>
>> Therefore, I'm +1.
>>
>> On Sat, Oct 8, 2011 at 2:59 AM, Thomas Jungblut
>> <th...@googlemail.com> wrote:
>> > Hey all,
>> >
>> > I just have seen Avery's talk about Giraph. (I know you are reading
>> > sometimes here, too. Thank you very much!)
>> > It was quite enlightening to me on what features we should talk about.
>> > He made some points on Combiners.
>> >
>> > I wonder if we can integrate them in our framework, and not in the graph
>> > layer that is upcoming.
>> > Miklos implemented the MessageBundle, which stores messages for the same
>> > class in a map.
>> > Which is somewhat equal to Combiners, but not user defined.
>> > I think we should let the user define their own Combiner! But I'm still
>> not
>> > very sure how.
>> > Should we let him iterate over the messages via an iterable like the
>> Reducer
>> > in M/R?
>> > For example, this could save a lot of memory if a user decides to merge
>> > messages.
>> > On the other hand, the user could have merge his messages right before he
>> > sended them. So it would be an unused feature.
>> >
>> > As you can see there are some things that should be clarified, do you
>> have
>> > an idea / opinion on that?
>> >
>> > --
>> > Thomas Jungblut
>> > Berlin <th...@gmail.com>
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: Combiners

Posted by Thomas Jungblut <th...@googlemail.com>.
Thanks Edward for your reply.

It is a slightly different concept from Bundling messages. As
> described Pregel paper, 'Combiner' function can be used to reduce size
> of messages in cases when messages could be summarized arithmetically
> e.g., min, max, sum, and average.
>

I remember "Aggregators", wasn't this the min, max functions?

If we add this function to BSP programming model, BSP application code
> can be concise and maintainable. Otherwise, user have to write own
> codes in bsp() function.
>

Yes. That was my intention, too.

We could use them in our graph examples (pagerank and SSSP) to combine the
communication between the master task and the real tasks (the number of
updated vertices / the difference between new and old rank).
But I am still facing some design issues.
E.g: when should the combiner be called and in which form. In MapReduce they
are plain reducers, so we can have the same model but not grouped by a key
rather than the class name of the message.

We need a smart idea ;)


2011/10/8 Edward J. Yoon <ed...@apache.org>

> It is a slightly different concept from Bundling messages. As
> described Pregel paper, 'Combiner' function can be used to reduce size
> of messages in cases when messages could be summarized arithmetically
> e.g., min, max, sum, and average.
>
> If we add this function to BSP programming model, BSP application code
> can be concise and maintainable. Otherwise, user have to write own
> codes in bsp() function.
>
> Therefore, I'm +1.
>
> On Sat, Oct 8, 2011 at 2:59 AM, Thomas Jungblut
> <th...@googlemail.com> wrote:
> > Hey all,
> >
> > I just have seen Avery's talk about Giraph. (I know you are reading
> > sometimes here, too. Thank you very much!)
> > It was quite enlightening to me on what features we should talk about.
> > He made some points on Combiners.
> >
> > I wonder if we can integrate them in our framework, and not in the graph
> > layer that is upcoming.
> > Miklos implemented the MessageBundle, which stores messages for the same
> > class in a map.
> > Which is somewhat equal to Combiners, but not user defined.
> > I think we should let the user define their own Combiner! But I'm still
> not
> > very sure how.
> > Should we let him iterate over the messages via an iterable like the
> Reducer
> > in M/R?
> > For example, this could save a lot of memory if a user decides to merge
> > messages.
> > On the other hand, the user could have merge his messages right before he
> > sended them. So it would be an unused feature.
> >
> > As you can see there are some things that should be clarified, do you
> have
> > an idea / opinion on that?
> >
> > --
> > Thomas Jungblut
> > Berlin <th...@gmail.com>
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin <th...@gmail.com>

Re: Combiners

Posted by "Edward J. Yoon" <ed...@apache.org>.
It is a slightly different concept from Bundling messages. As
described Pregel paper, 'Combiner' function can be used to reduce size
of messages in cases when messages could be summarized arithmetically
e.g., min, max, sum, and average.

If we add this function to BSP programming model, BSP application code
can be concise and maintainable. Otherwise, user have to write own
codes in bsp() function.

Therefore, I'm +1.

On Sat, Oct 8, 2011 at 2:59 AM, Thomas Jungblut
<th...@googlemail.com> wrote:
> Hey all,
>
> I just have seen Avery's talk about Giraph. (I know you are reading
> sometimes here, too. Thank you very much!)
> It was quite enlightening to me on what features we should talk about.
> He made some points on Combiners.
>
> I wonder if we can integrate them in our framework, and not in the graph
> layer that is upcoming.
> Miklos implemented the MessageBundle, which stores messages for the same
> class in a map.
> Which is somewhat equal to Combiners, but not user defined.
> I think we should let the user define their own Combiner! But I'm still not
> very sure how.
> Should we let him iterate over the messages via an iterable like the Reducer
> in M/R?
> For example, this could save a lot of memory if a user decides to merge
> messages.
> On the other hand, the user could have merge his messages right before he
> sended them. So it would be an unused feature.
>
> As you can see there are some things that should be clarified, do you have
> an idea / opinion on that?
>
> --
> Thomas Jungblut
> Berlin <th...@gmail.com>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon