You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Jens-U. Mozdzen" <jm...@nde.ag> on 2015/01/05 11:30:53 UTC
fieldsGrouping() for multiple input streams
Hi *,
I have a bolt C (running with multiple instances) which receives two
input streams, from bolt A and B.
Both input streams have the same key fields in their tuples, plus
additional, distinct fields.
Does Storm support fieldsGrouping() so that tuples from both bolt A
and B are distributed across the bolt C instances, in a way that
tuples with identical key fields from A and B end up on the same
instance of C? IOW, can fieldsGrouping() do its grouping from a
receiving end's point of view, rather than from the sender's POV?
Regards,
Jens
Re: fieldsGrouping() for multiple input streams
Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Michael,
Zitat von Michael Rose <mi...@fullcontact.com>:
> In this case, it's correct -- a fields grouping ensures that it ends up at
> the same task each time regardless of the source.
now that's something I can quote ;) Thank you for the confirmation,
it'll make the corresponding part of my current topology much more
elegant.
Regards,
Jens
Re: fieldsGrouping() for multiple input streams
Posted by Michael Rose <mi...@fullcontact.com>.
In this case, it's correct -- a fields grouping ensures that it ends up at
the same task each time regardless of the source.
Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
michael@fullcontact.com
On Fri, Jan 9, 2015 at 7:35 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
> Hi Konsala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> I believe all tuples with key = valueX will end up in the same instance of
>> bolt C, regardless of whether the tuple originated from A or B.
>>
>> It seems this is what Nathan mentions in this reply
>> <https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
>> well.
>>
>> *"Just to make sure what Ben is saying is clear, in a fields grouping a
>> grouping value will always go to the same consumer task, regardless of the
>> source. Otherwise things like joins of different streams would not be
>> possible. "*
>>
>
> that thread turned up in my search, too - rather old, but possibly still
> valid. OTOH I'd have expected such an essential information to be given in
> the docs, but couldn't spot it there, hence the question here. "I believe"
> is something I may not want to tell our customers as a reference source,
> once it turned out to be wrong in a production environment ;)
>
> Regards,
> Jens
>
>
Re: fieldsGrouping() for multiple input streams
Posted by Kosala Dissanayake <um...@gmail.com>.
That's all right :)
On Mon, Jan 12, 2015 at 3:28 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
> Hi Konsala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> Your 'reference source' would have been the comment I took time out of my
>> day to dig out for you, which was made by the original developer of Storm,
>> and which if you had properly understood would not have made you ask this
>> question.
>>
>
> sorry if you feel offended - that was not my intention.
>
> Jens
>
>
Re: fieldsGrouping() for multiple input streams
Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Konsala,
Zitat von "UmaraDissa1 ." <um...@gmail.com>:
> Your 'reference source' would have been the comment I took time out of my
> day to dig out for you, which was made by the original developer of Storm,
> and which if you had properly understood would not have made you ask this
> question.
sorry if you feel offended - that was not my intention.
Jens
Re: fieldsGrouping() for multiple input streams
Posted by "UmaraDissa1 ." <um...@gmail.com>.
Your 'reference source' would have been the comment I took time out of my
day to dig out for you, which was made by the original developer of Storm,
and which if you had properly understood would not have made you ask this
question.
On Sat, Jan 10, 2015 at 1:35 AM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
> Hi Konsala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> I believe all tuples with key = valueX will end up in the same instance of
>> bolt C, regardless of whether the tuple originated from A or B.
>>
>> It seems this is what Nathan mentions in this reply
>> <https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
>> well.
>>
>> *"Just to make sure what Ben is saying is clear, in a fields grouping a
>> grouping value will always go to the same consumer task, regardless of the
>> source. Otherwise things like joins of different streams would not be
>> possible. "*
>>
>
> that thread turned up in my search, too - rather old, but possibly still
> valid. OTOH I'd have expected such an essential information to be given in
> the docs, but couldn't spot it there, hence the question here. "I believe"
> is something I may not want to tell our customers as a reference source,
> once it turned out to be wrong in a production environment ;)
>
> Regards,
> Jens
>
>
Re: fieldsGrouping() for multiple input streams
Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Konsala,
Zitat von "UmaraDissa1 ." <um...@gmail.com>:
> I believe all tuples with key = valueX will end up in the same instance of
> bolt C, regardless of whether the tuple originated from A or B.
>
> It seems this is what Nathan mentions in this reply
> <https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
> well.
>
> *"Just to make sure what Ben is saying is clear, in a fields grouping a
> grouping value will always go to the same consumer task, regardless of the
> source. Otherwise things like joins of different streams would not be
> possible. "*
that thread turned up in my search, too - rather old, but possibly
still valid. OTOH I'd have expected such an essential information to
be given in the docs, but couldn't spot it there, hence the question
here. "I believe" is something I may not want to tell our customers as
a reference source, once it turned out to be wrong in a production
environment ;)
Regards,
Jens
Re: fieldsGrouping() for multiple input streams
Posted by "UmaraDissa1 ." <um...@gmail.com>.
I believe all tuples with key = valueX will end up in the same instance of
bolt C, regardless of whether the tuple originated from A or B.
It seems this is what Nathan mentions in this reply
<https://groups.google.com/d/msg/storm-user/ueRqxGhO6QI/F-CFUfJ65r0J> as
well.
*"Just to make sure what Ben is saying is clear, in a fields grouping a
grouping value will always go to the same consumer task, regardless of the
source. Otherwise things like joins of different streams would not be
possible. "*
On Thu, Jan 8, 2015 at 9:42 PM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
> Hi Kosala,
>
> Zitat von "UmaraDissa1 ." <um...@gmail.com>:
>
>> Hi Jens,
>>
>> I'm not entirely sure if I understand what you want to achieve, but
>> wouldn't having two input streams to bolt C, one each from A and B, each
>> with a fields grouping, solve your problem?
>>
>
> A sends tuples with fields "key, fA1, fA2".
> B sends tuples with fields "key, fB1, fB2".
>
> The topology is created with bolt C connecting to bolt A and bolt B, both
> times via fieldsGrouping() on "key". C does run with more than one instance.
>
> A sends tuples with key="valueX", key="valueY" and so on (plus values for
> fA1, fA2).
> B sends tuples with key="valueX", key="valueY" and so on (plus values for
> fB1, fB2).
>
> My question is: Will all tuples, either from A or B, with key="valueX",
> end up in the same instance of bolt C?
>
> Or is grouping handled individually, thus all tuples from A with
> key="valueX" will end up in the same instance of C, but all tuples from B
> with key="valueX" might end up in another single instance of C?
>
> Regards,
> Jens
>
>
Re: fieldsGrouping() for multiple input streams
Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.
Hi Kosala,
Zitat von "UmaraDissa1 ." <um...@gmail.com>:
> Hi Jens,
>
> I'm not entirely sure if I understand what you want to achieve, but
> wouldn't having two input streams to bolt C, one each from A and B, each
> with a fields grouping, solve your problem?
A sends tuples with fields "key, fA1, fA2".
B sends tuples with fields "key, fB1, fB2".
The topology is created with bolt C connecting to bolt A and bolt B,
both times via fieldsGrouping() on "key". C does run with more than
one instance.
A sends tuples with key="valueX", key="valueY" and so on (plus values
for fA1, fA2).
B sends tuples with key="valueX", key="valueY" and so on (plus values
for fB1, fB2).
My question is: Will all tuples, either from A or B, with
key="valueX", end up in the same instance of bolt C?
Or is grouping handled individually, thus all tuples from A with
key="valueX" will end up in the same instance of C, but all tuples
from B with key="valueX" might end up in another single instance of C?
Regards,
Jens
Re: fieldsGrouping() for multiple input streams
Posted by "UmaraDissa1 ." <um...@gmail.com>.
Hi Jens,
I'm not entirely sure if I understand what you want to achieve, but
wouldn't having two input streams to bolt C, one each from A and B, each
with a fields grouping, solve your problem?
Cheers,
Kosala
On Mon, Jan 5, 2015 at 9:30 PM, Jens-U. Mozdzen <jm...@nde.ag> wrote:
> Hi *,
>
> I have a bolt C (running with multiple instances) which receives two input
> streams, from bolt A and B.
>
> Both input streams have the same key fields in their tuples, plus
> additional, distinct fields.
>
> Does Storm support fieldsGrouping() so that tuples from both bolt A and B
> are distributed across the bolt C instances, in a way that tuples with
> identical key fields from A and B end up on the same instance of C? IOW,
> can fieldsGrouping() do its grouping from a receiving end's point of view,
> rather than from the sender's POV?
>
> Regards,
> Jens
>
>
>