You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Roberto Coluccio <ro...@gmail.com> on 2014/03/05 17:52:36 UTC

Storm stream grouping examples

Hello folks,

I was unable to find any complete example (or, better, related work in the
scientific literature) in which (almost) all the *stream grouping
policies*have been used and compared. Do you have any reference you
could please
share with me?

Thank you and best regards,

Roberto Coluccio

Re: Storm stream grouping examples

Posted by Roberto Coluccio <ro...@gmail.com>.
Thank you guys, I was looking for a benchmark in order to add it to an
"official" document as a reference. Somebody asked me to do this, and I
agree that the choice of the grouping policy 90% depends on the business
logic. I was just wondering I there was any "public result" I wasn't able
to find!

Thank you again.




On Wed, Mar 5, 2014 at 7:22 PM, Michael Rose <mi...@fullcontact.com>wrote:

> +1, localOrShuffle will be a winner, as long as it's evenly distributing
> work. If 1 tuple could say produce a variable 1-100 resultant tuples (and
> these results were expensive enough to process, e.g. IO), it might well be
> worth shuffling vs. localShuffling.
>
> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
> michael@fullcontact.com
>
>
> On Wed, Mar 5, 2014 at 11:19 AM, Nathan Leung <nc...@gmail.com> wrote:
>
>> In my experience on a 1 Gb network localOrShuffleGrouping was a clear
>> winner in terms of performance.  But I haven't tested with 10 Gb, and if
>> you have substantial business logic then that becomes a bigger factor than
>> serializing/transferring data on the network.  I think the performance of
>> any given grouping is too dependent on your business logic; it will be
>> difficult to quantify how well it performs in a canned benchmark.  And
>> sometimes your business logic will define a grouping for you (e.g. fields
>> grouping) whether it's the best performer or not.
>>
>>
>> On Wed, Mar 5, 2014 at 1:05 PM, Roberto Coluccio <
>> roberto.coluccio@gmail.com> wrote:
>>
>>> Hello Michael, thanks for your feedback.
>>>
>>> I'm looking for a performance comparison. I know that not all the
>>> policies are "really comparable", but even obvious comparisons all listed
>>> together could be a useful reference.
>>>
>>> Roberto
>>>
>>>
>>> On Wed, Mar 5, 2014 at 6:58 PM, Michael Rose <mi...@fullcontact.com>wrote:
>>>
>>>> What kind of comparisons are you looking for? How they functionally
>>>> work?
>>>>
>>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
>>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
>>>> michael@fullcontact.com
>>>>
>>>>
>>>> On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio <
>>>> roberto.coluccio@gmail.com> wrote:
>>>>
>>>>> Hello folks,
>>>>>
>>>>> I was unable to find any complete example (or, better, related work in
>>>>> the scientific literature) in which (almost) all the *stream grouping
>>>>> policies* have been used and compared. Do you have any reference you
>>>>> could please share with me?
>>>>>
>>>>> Thank you and best regards,
>>>>>
>>>>> Roberto Coluccio
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Storm stream grouping examples

Posted by Michael Rose <mi...@fullcontact.com>.
+1, localOrShuffle will be a winner, as long as it's evenly distributing
work. If 1 tuple could say produce a variable 1-100 resultant tuples (and
these results were expensive enough to process, e.g. IO), it might well be
worth shuffling vs. localShuffling.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
michael@fullcontact.com


On Wed, Mar 5, 2014 at 11:19 AM, Nathan Leung <nc...@gmail.com> wrote:

> In my experience on a 1 Gb network localOrShuffleGrouping was a clear
> winner in terms of performance.  But I haven't tested with 10 Gb, and if
> you have substantial business logic then that becomes a bigger factor than
> serializing/transferring data on the network.  I think the performance of
> any given grouping is too dependent on your business logic; it will be
> difficult to quantify how well it performs in a canned benchmark.  And
> sometimes your business logic will define a grouping for you (e.g. fields
> grouping) whether it's the best performer or not.
>
>
> On Wed, Mar 5, 2014 at 1:05 PM, Roberto Coluccio <
> roberto.coluccio@gmail.com> wrote:
>
>> Hello Michael, thanks for your feedback.
>>
>> I'm looking for a performance comparison. I know that not all the
>> policies are "really comparable", but even obvious comparisons all listed
>> together could be a useful reference.
>>
>> Roberto
>>
>>
>> On Wed, Mar 5, 2014 at 6:58 PM, Michael Rose <mi...@fullcontact.com>wrote:
>>
>>> What kind of comparisons are you looking for? How they functionally work?
>>>
>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
>>> michael@fullcontact.com
>>>
>>>
>>> On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio <
>>> roberto.coluccio@gmail.com> wrote:
>>>
>>>> Hello folks,
>>>>
>>>> I was unable to find any complete example (or, better, related work in
>>>> the scientific literature) in which (almost) all the *stream grouping
>>>> policies* have been used and compared. Do you have any reference you
>>>> could please share with me?
>>>>
>>>> Thank you and best regards,
>>>>
>>>> Roberto Coluccio
>>>>
>>>
>>>
>>
>

Re: Storm stream grouping examples

Posted by Nathan Leung <nc...@gmail.com>.
In my experience on a 1 Gb network localOrShuffleGrouping was a clear
winner in terms of performance.  But I haven't tested with 10 Gb, and if
you have substantial business logic then that becomes a bigger factor than
serializing/transferring data on the network.  I think the performance of
any given grouping is too dependent on your business logic; it will be
difficult to quantify how well it performs in a canned benchmark.  And
sometimes your business logic will define a grouping for you (e.g. fields
grouping) whether it's the best performer or not.


On Wed, Mar 5, 2014 at 1:05 PM, Roberto Coluccio <roberto.coluccio@gmail.com
> wrote:

> Hello Michael, thanks for your feedback.
>
> I'm looking for a performance comparison. I know that not all the policies
> are "really comparable", but even obvious comparisons all listed together
> could be a useful reference.
>
> Roberto
>
>
> On Wed, Mar 5, 2014 at 6:58 PM, Michael Rose <mi...@fullcontact.com>wrote:
>
>> What kind of comparisons are you looking for? How they functionally work?
>>
>> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
>> michael@fullcontact.com
>>
>>
>> On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio <
>> roberto.coluccio@gmail.com> wrote:
>>
>>> Hello folks,
>>>
>>> I was unable to find any complete example (or, better, related work in
>>> the scientific literature) in which (almost) all the *stream grouping
>>> policies* have been used and compared. Do you have any reference you
>>> could please share with me?
>>>
>>> Thank you and best regards,
>>>
>>> Roberto Coluccio
>>>
>>
>>
>

Re: Storm stream grouping examples

Posted by Roberto Coluccio <ro...@gmail.com>.
Hello Michael, thanks for your feedback.

I'm looking for a performance comparison. I know that not all the policies
are "really comparable", but even obvious comparisons all listed together
could be a useful reference.

Roberto


On Wed, Mar 5, 2014 at 6:58 PM, Michael Rose <mi...@fullcontact.com>wrote:

> What kind of comparisons are you looking for? How they functionally work?
>
> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
> michael@fullcontact.com
>
>
> On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio <
> roberto.coluccio@gmail.com> wrote:
>
>> Hello folks,
>>
>> I was unable to find any complete example (or, better, related work in
>> the scientific literature) in which (almost) all the *stream grouping
>> policies* have been used and compared. Do you have any reference you
>> could please share with me?
>>
>> Thank you and best regards,
>>
>> Roberto Coluccio
>>
>
>

Re: Storm stream grouping examples

Posted by Michael Rose <mi...@fullcontact.com>.
What kind of comparisons are you looking for? How they functionally work?

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
michael@fullcontact.com


On Wed, Mar 5, 2014 at 9:52 AM, Roberto Coluccio <roberto.coluccio@gmail.com
> wrote:

> Hello folks,
>
> I was unable to find any complete example (or, better, related work in the
> scientific literature) in which (almost) all the *stream grouping
> policies* have been used and compared. Do you have any reference you
> could please share with me?
>
> Thank you and best regards,
>
> Roberto Coluccio
>