You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Matt Lowe <ma...@gmail.com> on 2016/11/02 15:30:46 UTC

Question about fields grouping

Hello everyone,

I am using storm version 0.9.4.
I have a topology like so:

Spout —> RoutingBolt  —> CacheBolt

Spout sends message to the RoutingBolt, which assigns an Id
(`tableName/tableEndPoint/time/bucket/bucket/event`) with field grouping,
so that all matching tuples will hit the same CacheBolt.

Here is an image of the CPU spread (*you can ignore from 11/2 onward, its
unrelated*):





The important part of the ID is the time: `tableName/tableEndPoint/*time*
/bucket/bucket/event`
The time is aggregated to the current hour, so 14:33 becomes 14:00.
Which means that every hour the field grouping should assign the IDs to a
new random bolt, though it seems as though they are all going to the same
ones, as demonstrated by the above.

Can anyone with better understanding of fields grouping explain why this is?

Thanks

Re: Question about fields grouping

Posted by Navin Ipe <na...@searchlighthealth.com>.
Specifically written for Fields Grouping understanding:
http://nrecursions.blogspot.in/2016/09/concepts-about-storm-you-need-to-know.html#fieldsgroupingdoesnotgetoverwhelmedwithdata
and
http://nrecursions.blogspot.in/2016/09/understanding-fields-grouping-in-apache.html

On Wed, Nov 2, 2016 at 9:00 PM, Matt Lowe <ma...@gmail.com> wrote:

>
>
> Hello everyone,
>
> I am using storm version 0.9.4.
> I have a topology like so:
>
> Spout —> RoutingBolt  —> CacheBolt
>
> Spout sends message to the RoutingBolt, which assigns an Id
> (`tableName/tableEndPoint/time/bucket/bucket/event`) with field grouping,
> so that all matching tuples will hit the same CacheBolt.
>
> Here is an image of the CPU spread (*you can ignore from 11/2 onward, its
> unrelated*):
>
>
>
>
>
> The important part of the ID is the time: `tableName/tableEndPoint/*time*/
> bucket/bucket/event`
> The time is aggregated to the current hour, so 14:33 becomes 14:00.
> Which means that every hour the field grouping should assign the IDs to a
> new random bolt, though it seems as though they are all going to the same
> ones, as demonstrated by the above.
>
> Can anyone with better understanding of fields grouping explain why this
> is?
>
> Thanks
>
>
>


-- 
Regards,
Navin