You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Jens-U. Mozdzen" <jm...@nde.ag> on 2015/01/05 11:30:53 UTC

fieldsGrouping() for multiple input streams

Hi *,

I have a bolt C (running with multiple instances) which receives two  
input streams, from bolt A and B.

Both input streams have the same key fields in their tuples, plus  
additional, distinct fields.

Does Storm support fieldsGrouping() so that tuples from both bolt A  
and B are distributed across the bolt C instances, in a way that  
tuples with identical key fields from A and B end up on the same  
instance of C? IOW, can fieldsGrouping() do its grouping from a  
receiving end's point of view, rather than from the sender's POV?

Regards,
Jens



Re: fieldsGrouping() for multiple input streams

Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Michael,

Zitat von Michael Rose <mi...@fullcontact.com>:
> In this case, it's correct -- a fields grouping ensures that it ends up at
> the same task each time regardless of the source.

now that's something I can quote ;) Thank you for the confirmation,  
it'll make the corresponding part of my current topology much more  
elegant.

Regards,
Jens


Re: fieldsGrouping() for multiple input streams

Posted by Michael Rose <mi...@fullcontact.com>.
In this case, it's correct -- a fields grouping ensures that it ends up at
the same task each time regardless of the source.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
michael@fullcontact.com

On Fri, Jan 9, 2015 at 7:35 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:

> Hi Konsala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> I believe all tuples with key = valueX will end up in the same instance of
>> bolt C, regardless of whether the tuple originated from A or B.
>>
>> It seems this is what Nathan mentions in this reply
>> <https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
>> well.
>>
>> *"Just to make sure what Ben is saying is clear, in a fields grouping a
>> grouping value will always go to the same consumer task, regardless of the
>> source. Otherwise things like joins of different streams would not be
>> possible. "*
>>
>
> that thread turned up in my search, too - rather old, but possibly still
> valid. OTOH I'd have expected such an essential information to be given in
> the docs, but couldn't spot it there, hence the question here. "I believe"
> is something I may not want to tell our customers as a reference source,
> once it turned out to be wrong in a production environment ;)
>
> Regards,
> Jens
>
>

Re: fieldsGrouping() for multiple input streams

Posted by Kosala Dissanayake <um...@gmail.com>.
That's all right :)

On Mon, Jan 12, 2015 at 3:28 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:

> Hi Konsala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> Your 'reference source' would have been the comment I took time out of my
>> day to dig out for you, which was made by the original developer of Storm,
>> and which if you had properly understood would not have made you ask this
>> question.
>>
>
> sorry if you feel offended - that was not my intention.
>
> Jens
>
>

Re: fieldsGrouping() for multiple input streams

Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Konsala,

Zitat von "UmaraDissa1 ." <um...@gmail.com>:
> Your 'reference source' would have been the comment I took time out of my
> day to dig out for you, which was made by the original developer of Storm,
> and which if you had properly understood would not have made you ask this
> question.

sorry if you feel offended - that was not my intention.

Jens


Re: fieldsGrouping() for multiple input streams

Posted by "UmaraDissa1 ." <um...@gmail.com>.
Your 'reference source' would have been the comment I took time out of my
day to dig out for you, which was made by the original developer of Storm,
and which if you had properly understood would not have made you ask this
question.



On Sat, Jan 10, 2015 at 1:35 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:

> Hi Konsala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> I believe all tuples with key = valueX will end up in the same instance of
>> bolt C, regardless of whether the tuple originated from A or B.
>>
>> It seems this is what Nathan mentions in this reply
>> <https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
>> well.
>>
>> *"Just to make sure what Ben is saying is clear, in a fields grouping a
>> grouping value will always go to the same consumer task, regardless of the
>> source. Otherwise things like joins of different streams would not be
>> possible. "*
>>
>
> that thread turned up in my search, too - rather old, but possibly still
> valid. OTOH I'd have expected such an essential information to be given in
> the docs, but couldn't spot it there, hence the question here. "I believe"
> is something I may not want to tell our customers as a reference source,
> once it turned out to be wrong in a production environment ;)
>
> Regards,
> Jens
>
>

Re: fieldsGrouping() for multiple input streams

Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Konsala,

Zitat von "UmaraDissa1 ." <um...@gmail.com>:
> I believe all tuples with key = valueX will end up in the same instance of
> bolt C, regardless of whether the tuple originated from A or B.
>
> It seems this is what Nathan mentions in this reply
> <https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
> well.
>
> *"Just to make sure what Ben is saying is clear, in a fields grouping a
> grouping value will always go to the same consumer task, regardless of the
> source. Otherwise things like joins of different streams would not be
> possible. "*

that thread turned up in my search, too - rather old, but possibly  
still valid. OTOH I'd have expected such an essential information to  
be given in the docs, but couldn't spot it there, hence the question  
here. "I believe" is something I may not want to tell our customers as  
a reference source, once it turned out to be wrong in a production  
environment ;)

Regards,
Jens


Re: fieldsGrouping() for multiple input streams

Posted by "UmaraDissa1 ." <um...@gmail.com>.
I believe all tuples with key = valueX will end up in the same instance of
bolt C, regardless of whether the tuple originated from A or B.

It seems this is what Nathan mentions in this reply
<https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
well.

*"Just to make sure what Ben is saying is clear, in a fields grouping a
grouping value will always go to the same consumer task, regardless of the
source. Otherwise things like joins of different streams would not be
possible. "*






On Thu, Jan 8, 2015 at 9:42 PM, Jens-U. Mozdzen <jm...@nde.ag> wrote:

> Hi Kosala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> Hi Jens,
>>
>> I'm not entirely sure if I understand what you want to achieve, but
>> wouldn't having two input streams to bolt C, one each from A and B, each
>> with a fields grouping, solve your problem?
>>
>
> A sends tuples with fields "key, fA1, fA2".
> B sends tuples with fields "key, fB1, fB2".
>
> The topology is created with bolt C connecting to bolt A and bolt B, both
> times via fieldsGrouping() on "key". C does run with more than one instance.
>
> A sends tuples with key="valueX", key="valueY" and so on (plus values for
> fA1, fA2).
> B sends tuples with key="valueX", key="valueY" and so on (plus values for
> fB1, fB2).
>
> My question is: Will all tuples, either from A or B, with key="valueX",
> end up in the same instance of bolt C?
>
> Or is grouping handled individually, thus all tuples from A with
> key="valueX" will end up in the same instance of C, but all tuples from B
> with key="valueX" might end up in another single instance of C?
>
> Regards,
> Jens
>
>

Re: fieldsGrouping() for multiple input streams

Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Kosala,

Zitat von "UmaraDissa1 ." <um...@gmail.com>:
> Hi Jens,
>
> I'm not entirely sure if I understand what you want to achieve, but
> wouldn't having two input streams to bolt C, one each from A and B, each
> with a fields grouping, solve your problem?

A sends tuples with fields "key, fA1, fA2".
B sends tuples with fields "key, fB1, fB2".

The topology is created with bolt C connecting to bolt A and bolt B,  
both times via fieldsGrouping() on "key". C does run with more than  
one instance.

A sends tuples with key="valueX", key="valueY" and so on (plus values  
for fA1, fA2).
B sends tuples with key="valueX", key="valueY" and so on (plus values  
for fB1, fB2).

My question is: Will all tuples, either from A or B, with  
key="valueX", end up in the same instance of bolt C?

Or is grouping handled individually, thus all tuples from A with  
key="valueX" will end up in the same instance of C, but all tuples  
from B with key="valueX" might end up in another single instance of C?

Regards,
Jens


Re: fieldsGrouping() for multiple input streams

Posted by "UmaraDissa1 ." <um...@gmail.com>.
Hi Jens,

I'm not entirely sure if I understand what you want to achieve, but
wouldn't having two input streams to bolt C, one each from A and B, each
with a fields grouping, solve your problem?

Cheers,
Kosala

On Mon, Jan 5, 2015 at 9:30 PM, Jens-U. Mozdzen <jm...@nde.ag> wrote:

> Hi *,
>
> I have a bolt C (running with multiple instances) which receives two input
> streams, from bolt A and B.
>
> Both input streams have the same key fields in their tuples, plus
> additional, distinct fields.
>
> Does Storm support fieldsGrouping() so that tuples from both bolt A and B
> are distributed across the bolt C instances, in a way that tuples with
> identical key fields from A and B end up on the same instance of C? IOW,
> can fieldsGrouping() do its grouping from a receiving end's point of view,
> rather than from the sender's POV?
>
> Regards,
> Jens
>
>
>