You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Javier Gonzalez <ja...@gmail.com> on 2015/08/07 21:01:44 UTC

Multiple Streams vs Multiple Subscribers of the same stream

Hi all,

Suppose I have a bolt A that has to send information to two bolts B and C.
Each bolt must receive different information from the original A bolt.
Which of these strategies is more efficient?

Strategy 1:
- have A declare a single output stream, with fields "forB" and "forC".
- Emit all the information in a single tuple, putting the information for
Bolt B in "forB" and the information for bolt C in "forC".
- Have Bolt B and Bolt C subscribe to Bolt A‘s single output channel.
- In Bolt B and Bolt C execute method read only the relevant part of the
input tuple.

Strategy 2:
- have A declare two output streams, “streamB” and “streamC“.
- emit one tuple with the information for bolt B in streamB, and one in
with the
information for Bolt C in StreamC.
- Have each bolt subscribe only to their relevant stream.
- Each bolt works as usual with their payload in their execute methods.

A priori I would think Strategy 2 is better (as we would be emitting
smaller tuples), but I'm not sure if there's a hidden cost/benefit in
having multiple subscribers to a single stream

Thank you,
Javier

Re: Multiple Streams vs Multiple Subscribers of the same stream

Posted by Javier Gonzalez <ja...@gmail.com>.
Thank you Nathan and Kishore.

On Fri, Aug 7, 2015 at 4:37 PM, Nathan Leung <nc...@gmail.com> wrote:

> It's even worse, you have information for both bolts sent twice, instead
> of information for one bolt sent once, so assuming same message size and
> same frequency of messages for both bolts you are sending 4x data.  Use
> option 2.
> On Aug 7, 2015 1:18 PM, "Kishore Senji" <ks...@gmail.com> wrote:
>
>> I also think option 2 is better. There is another reason for choosing
>> this other than being a smaller payload that goes across. Today it could be
>> that A bolt splits the stream 1:1 for B & C. But later if it becomes 1:2
>> for example, having a different stream for C allows you to scale Bolt C
>> (more parallelism) to improve the throughput. If you had only one Stream,
>> then you can give a 1 message to B and 2 messages (as a list) to C, but
>> there is no way to scale C (even if you add more parallelism, the
>> throughput wouldn't improve as it would have to process 2 messages in
>> serial)
>>
>> I do not think there is a cost to having more streams and so choosing the
>> second option might be better.
>>
>> On Fri, Aug 7, 2015 at 12:01 PM, Javier Gonzalez <ja...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> Suppose I have a bolt A that has to send information to two bolts B and
>>> C. Each bolt must receive different information from the original A bolt.
>>> Which of these strategies is more efficient?
>>>
>>> Strategy 1:
>>> - have A declare a single output stream, with fields "forB" and "forC".
>>> - Emit all the information in a single tuple, putting the information
>>> for Bolt B in "forB" and the information for bolt C in "forC".
>>> - Have Bolt B and Bolt C subscribe to Bolt A‘s single output channel.
>>> - In Bolt B and Bolt C execute method read only the relevant part of the
>>> input tuple.
>>>
>>> Strategy 2:
>>> - have A declare two output streams, “streamB” and “streamC“.
>>> - emit one tuple with the information for bolt B in streamB, and one in
>>> with the
>>> information for Bolt C in StreamC.
>>> - Have each bolt subscribe only to their relevant stream.
>>> - Each bolt works as usual with their payload in their execute methods.
>>>
>>> A priori I would think Strategy 2 is better (as we would be emitting
>>> smaller tuples), but I'm not sure if there's a hidden cost/benefit in
>>> having multiple subscribers to a single stream
>>>
>>> Thank you,
>>> Javier
>>>
>>
>>


-- 
Javier González Nicolini

Re: Multiple Streams vs Multiple Subscribers of the same stream

Posted by Nathan Leung <nc...@gmail.com>.
It's even worse, you have information for both bolts sent twice, instead of
information for one bolt sent once, so assuming same message size and same
frequency of messages for both bolts you are sending 4x data.  Use option 2.
On Aug 7, 2015 1:18 PM, "Kishore Senji" <ks...@gmail.com> wrote:

> I also think option 2 is better. There is another reason for choosing this
> other than being a smaller payload that goes across. Today it could be that
> A bolt splits the stream 1:1 for B & C. But later if it becomes 1:2 for
> example, having a different stream for C allows you to scale Bolt C (more
> parallelism) to improve the throughput. If you had only one Stream, then
> you can give a 1 message to B and 2 messages (as a list) to C, but there is
> no way to scale C (even if you add more parallelism, the throughput
> wouldn't improve as it would have to process 2 messages in serial)
>
> I do not think there is a cost to having more streams and so choosing the
> second option might be better.
>
> On Fri, Aug 7, 2015 at 12:01 PM, Javier Gonzalez <ja...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> Suppose I have a bolt A that has to send information to two bolts B and
>> C. Each bolt must receive different information from the original A bolt.
>> Which of these strategies is more efficient?
>>
>> Strategy 1:
>> - have A declare a single output stream, with fields "forB" and "forC".
>> - Emit all the information in a single tuple, putting the information for
>> Bolt B in "forB" and the information for bolt C in "forC".
>> - Have Bolt B and Bolt C subscribe to Bolt A‘s single output channel.
>> - In Bolt B and Bolt C execute method read only the relevant part of the
>> input tuple.
>>
>> Strategy 2:
>> - have A declare two output streams, “streamB” and “streamC“.
>> - emit one tuple with the information for bolt B in streamB, and one in
>> with the
>> information for Bolt C in StreamC.
>> - Have each bolt subscribe only to their relevant stream.
>> - Each bolt works as usual with their payload in their execute methods.
>>
>> A priori I would think Strategy 2 is better (as we would be emitting
>> smaller tuples), but I'm not sure if there's a hidden cost/benefit in
>> having multiple subscribers to a single stream
>>
>> Thank you,
>> Javier
>>
>
>

Re: Multiple Streams vs Multiple Subscribers of the same stream

Posted by Kishore Senji <ks...@gmail.com>.
I also think option 2 is better. There is another reason for choosing this
other than being a smaller payload that goes across. Today it could be that
A bolt splits the stream 1:1 for B & C. But later if it becomes 1:2 for
example, having a different stream for C allows you to scale Bolt C (more
parallelism) to improve the throughput. If you had only one Stream, then
you can give a 1 message to B and 2 messages (as a list) to C, but there is
no way to scale C (even if you add more parallelism, the throughput
wouldn't improve as it would have to process 2 messages in serial)

I do not think there is a cost to having more streams and so choosing the
second option might be better.

On Fri, Aug 7, 2015 at 12:01 PM, Javier Gonzalez <ja...@gmail.com> wrote:

> Hi all,
>
> Suppose I have a bolt A that has to send information to two bolts B and C.
> Each bolt must receive different information from the original A bolt.
> Which of these strategies is more efficient?
>
> Strategy 1:
> - have A declare a single output stream, with fields "forB" and "forC".
> - Emit all the information in a single tuple, putting the information for
> Bolt B in "forB" and the information for bolt C in "forC".
> - Have Bolt B and Bolt C subscribe to Bolt A‘s single output channel.
> - In Bolt B and Bolt C execute method read only the relevant part of the
> input tuple.
>
> Strategy 2:
> - have A declare two output streams, “streamB” and “streamC“.
> - emit one tuple with the information for bolt B in streamB, and one in
> with the
> information for Bolt C in StreamC.
> - Have each bolt subscribe only to their relevant stream.
> - Each bolt works as usual with their payload in their execute methods.
>
> A priori I would think Strategy 2 is better (as we would be emitting
> smaller tuples), but I'm not sure if there's a hidden cost/benefit in
> having multiple subscribers to a single stream
>
> Thank you,
> Javier
>