You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Jonathan Yom-Tov <jo...@gmail.com> on 2014/08/08 18:25:29 UTC

How do I implement this topology in Storm?

I want to implement a topology that is similar to the RollingTopWords
topology in the Storm examples
<https://github.com/apache/incubator-storm/tree/master/examples/storm-starter>.
The idea is to count the frequency of words emitted. Basically, the spouts
emit words at random, the first level bolts count the frequency and pass
them on. The twist is that I want the bolts to pass on the frequency of a
word only if its frequency in one of the bolts exceeded a threshold. So,
for example, if the word "Nathan" passed the threshold of 5 occurrences
within a time window on one bolt then all bolts would start passing
"Nathan"'s frequency onwards.

What I thought of doing is having another layer of bolts which would have
the list of words which have passed a threshold. They would then receive
the words and frequencies from the previous layer of bolts and pass them on
only if they appear in the list. Obviously, this list would have to be
synchronized across the whole layer of bolts.

Is this a good idea? What would be the best way of implementing it?

Re: How do I implement this topology in Storm?

Posted by Jonathan Yom-Tov <jo...@gmail.com>.
OK, thanks. And is there another option if I don't want to go down the
fields grouping path?


On Fri, Aug 8, 2014 at 8:52 PM, P. Taylor Goetz <pt...@gmail.com> wrote:

> The logic for fields grouping is stateless. It does a hash mod on the
> field values. Essentially:
>
> selectedTask = fields.hashCode() % numTasks
>
> - Taylor
>
> On Aug 8, 2014, at 1:39 PM, Jonathan Yom-Tov <jo...@gmail.com> wrote:
>
> It's an option but if I understand correctly the field grouping will make
> sure each word goes to the same bolt every time, I'm guessing this requires
> some sort of central authority to coordinate. How does that happen?
>
>
> On Fri, Aug 8, 2014 at 8:27 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> Why not use fields grouping like the example so that you don't have to do
>> any coordination across bolts?
>> On Aug 8, 2014 12:25 PM, "Jonathan Yom-Tov" <jo...@gmail.com> wrote:
>>
>>>  I want to implement a topology that is similar to the RollingTopWords
>>> topology in the Storm examples
>>> <https://github.com/apache/incubator-storm/tree/master/examples/storm-starter>.
>>> The idea is to count the frequency of words emitted. Basically, the spouts
>>> emit words at random, the first level bolts count the frequency and pass
>>> them on. The twist is that I want the bolts to pass on the frequency of a
>>> word only if its frequency in one of the bolts exceeded a threshold. So,
>>> for example, if the word "Nathan" passed the threshold of 5 occurrences
>>> within a time window on one bolt then all bolts would start passing
>>> "Nathan"'s frequency onwards.
>>>
>>> What I thought of doing is having another layer of bolts which would
>>> have the list of words which have passed a threshold. They would then
>>> receive the words and frequencies from the previous layer of bolts and pass
>>> them on only if they appear in the list. Obviously, this list would have to
>>> be synchronized across the whole layer of bolts.
>>>
>>> Is this a good idea? What would be the best way of implementing it?
>>>
>>
>
>
> --
> Got a cool idea for a web startup? How about I build it for you? Check out
> http://chapter64.com/
>
>
>


-- 
Got a cool idea for a web startup? How about I build it for you? Check out
http://chapter64.com/

Re: How do I implement this topology in Storm?

Posted by "P. Taylor Goetz" <pt...@gmail.com>.
The logic for fields grouping is stateless. It does a hash mod on the field values. Essentially:

selectedTask = fields.hashCode() % numTasks

- Taylor

On Aug 8, 2014, at 1:39 PM, Jonathan Yom-Tov <jo...@gmail.com> wrote:

> It's an option but if I understand correctly the field grouping will make sure each word goes to the same bolt every time, I'm guessing this requires some sort of central authority to coordinate. How does that happen?
> 
> 
> On Fri, Aug 8, 2014 at 8:27 PM, Nathan Leung <nc...@gmail.com> wrote:
> Why not use fields grouping like the example so that you don't have to do any coordination across bolts?
> 
> On Aug 8, 2014 12:25 PM, "Jonathan Yom-Tov" <jo...@gmail.com> wrote:
> I want to implement a topology that is similar to the RollingTopWords topology in the Storm examples. The idea is to count the frequency of words emitted. Basically, the spouts emit words at random, the first level bolts count the frequency and pass them on. The twist is that I want the bolts to pass on the frequency of a word only if its frequency in one of the bolts exceeded a threshold. So, for example, if the word "Nathan" passed the threshold of 5 occurrences within a time window on one bolt then all bolts would start passing "Nathan"'s frequency onwards.
> 
> What I thought of doing is having another layer of bolts which would have the list of words which have passed a threshold. They would then receive the words and frequencies from the previous layer of bolts and pass them on only if they appear in the list. Obviously, this list would have to be synchronized across the whole layer of bolts.
> 
> Is this a good idea? What would be the best way of implementing it?
> 
> 
> 
> 
> -- 
> Got a cool idea for a web startup? How about I build it for you? Check out http://chapter64.com/


Re: How do I implement this topology in Storm?

Posted by Jonathan Yom-Tov <jo...@gmail.com>.
It's an option but if I understand correctly the field grouping will make
sure each word goes to the same bolt every time, I'm guessing this requires
some sort of central authority to coordinate. How does that happen?


On Fri, Aug 8, 2014 at 8:27 PM, Nathan Leung <nc...@gmail.com> wrote:

> Why not use fields grouping like the example so that you don't have to do
> any coordination across bolts?
> On Aug 8, 2014 12:25 PM, "Jonathan Yom-Tov" <jo...@gmail.com> wrote:
>
>>  I want to implement a topology that is similar to the RollingTopWords
>> topology in the Storm examples
>> <https://github.com/apache/incubator-storm/tree/master/examples/storm-starter>.
>> The idea is to count the frequency of words emitted. Basically, the spouts
>> emit words at random, the first level bolts count the frequency and pass
>> them on. The twist is that I want the bolts to pass on the frequency of a
>> word only if its frequency in one of the bolts exceeded a threshold. So,
>> for example, if the word "Nathan" passed the threshold of 5 occurrences
>> within a time window on one bolt then all bolts would start passing
>> "Nathan"'s frequency onwards.
>>
>> What I thought of doing is having another layer of bolts which would have
>> the list of words which have passed a threshold. They would then receive
>> the words and frequencies from the previous layer of bolts and pass them on
>> only if they appear in the list. Obviously, this list would have to be
>> synchronized across the whole layer of bolts.
>>
>> Is this a good idea? What would be the best way of implementing it?
>>
>


-- 
Got a cool idea for a web startup? How about I build it for you? Check out
http://chapter64.com/

Re: How do I implement this topology in Storm?

Posted by Nathan Leung <nc...@gmail.com>.
Why not use fields grouping like the example so that you don't have to do
any coordination across bolts?
On Aug 8, 2014 12:25 PM, "Jonathan Yom-Tov" <jo...@gmail.com> wrote:

> I want to implement a topology that is similar to the RollingTopWords
> topology in the Storm examples
> <https://github.com/apache/incubator-storm/tree/master/examples/storm-starter>.
> The idea is to count the frequency of words emitted. Basically, the spouts
> emit words at random, the first level bolts count the frequency and pass
> them on. The twist is that I want the bolts to pass on the frequency of a
> word only if its frequency in one of the bolts exceeded a threshold. So,
> for example, if the word "Nathan" passed the threshold of 5 occurrences
> within a time window on one bolt then all bolts would start passing
> "Nathan"'s frequency onwards.
>
> What I thought of doing is having another layer of bolts which would have
> the list of words which have passed a threshold. They would then receive
> the words and frequencies from the previous layer of bolts and pass them on
> only if they appear in the list. Obviously, this list would have to be
> synchronized across the whole layer of bolts.
>
> Is this a good idea? What would be the best way of implementing it?
>