You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Software Dev <st...@gmail.com> on 2014/04/01 18:45:49 UTC

Implementing Real-Time Trending Topics in Storm

In the article (http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)
and I was wondering what the rationale was for the emit frequencies
and how they all relate to each other.

In the example the RollingCountBolt emits every 3 seconds,
IntermediateRankingBolt every 2 seconds and TotalRankingBolt every 2
seconds. Does this mean that the rolling counts for the last 9 events
are ranked and emitted every 2 seconds? 7 seconds? A little confused.

Thanks

Re: Implementing Real-Time Trending Topics in Storm

Posted by Software Dev <st...@gmail.com>.
> Does that make sense?

Yes and no.

 In the example on your blog the RollingCountBolt is configured for 9
and 3 which I understand to mean: Emit the last 9 second rolling
window every 3 seconds. I just don't understand the 2 second emit
frequencies of the other bolts.

On Tue, Apr 1, 2014 at 11:20 AM, Michael G. Noll
<mi...@michael-noll.com> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> "Software Dev",
>
> in RollingCountBolt there are two *time* related settings:
>
> 1. The size (duration) of the sliding window itself.  In seconds.
> 2. The time interval at which the latest sliding window count is sent
> to downstream bolts.  In seconds.
>
> See details here:
> https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/bolt/RollingCountBolt.java
>
> I'm quoting from the code above:
>
> "The bolt is configured by two parameters, the length of the sliding
> window in seconds (which influences the output data of the bolt, i.e.
> how it will count objects) and the emit frequency in seconds (which
> influences how often the bolt will output the latest window counts).
> For instance, if the window length is set to an equivalent of five
> minutes and the emit frequency to one minute, then the bolt will
> output the latest five-minute sliding window every minute."
>
>
>> Does this mean that the rolling counts for the last 9 events are
>> ranked and emitted every 2 seconds? 7 seconds
>
> The RollingCountBolt "thinks" in seconds.  However, behind the scenes
> RollingCountBolt uses SlidingWindowCounter [1], which in turn is built
> upon SlotBasedCounter [2].  Both the SlidingWindowCounter and the
> SlotBasedCounter don't know anything about time or durations (no
> seconds, minutes, and such).  This is by design, as it decouples the
> responsibility of counting (SlidingWindowCounter/SlotBasedCounter)
> from the responsibility of tracking the time (RollingCountBolt).
>
> The Apache Spark project has exactly the same notion of
> emitFrequencyInSeconds and windowLengthInSeconds, which they call
> slideInterval and windowLength.  See
> https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html.
>  They also have a similar diagram to what I showed in [3] that
> explains the idea behind sliding windows, see section "Window
> Operations" in the Spark link above.
>
>
> Does that make sense?
> Michael
>
>
>
> [1]
> https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlidingWindowCounter.java
> [2]
> https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlotBasedCounter.java
> [3]
> http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/
>
>
> On 01.04.2014 18:45, Software Dev wrote:
>> In the article
>> (http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)
>>
>>
> and I was wondering what the rationale was for the emit frequencies
>> and how they all relate to each other.
>>
>> In the example the RollingCountBolt emits every 3 seconds,
>> IntermediateRankingBolt every 2 seconds and TotalRankingBolt every
>> 2 seconds. Does this mean that the rolling counts for the last 9
>> events are ranked and emitted every 2 seconds? 7 seconds? A little
>> confused.
>>
>> Thanks
>>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.22 (MingW32)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iEYEARECAAYFAlM7A2kACgkQeW5XuG18ujR93wCdHE6Ldu01fRgnMqjIi7chVMbu
> uEMAnjUyrZQq0xkg2REUzbgvk31A85Dm
> =YI7Y
> -----END PGP SIGNATURE-----

Re: Implementing Real-Time Trending Topics in Storm

Posted by "Michael G. Noll" <mi...@michael-noll.com>.
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

"Software Dev",

in RollingCountBolt there are two *time* related settings:

1. The size (duration) of the sliding window itself.  In seconds.
2. The time interval at which the latest sliding window count is sent
to downstream bolts.  In seconds.

See details here:
https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/bolt/RollingCountBolt.java

I'm quoting from the code above:

"The bolt is configured by two parameters, the length of the sliding
window in seconds (which influences the output data of the bolt, i.e.
how it will count objects) and the emit frequency in seconds (which
influences how often the bolt will output the latest window counts).
For instance, if the window length is set to an equivalent of five
minutes and the emit frequency to one minute, then the bolt will
output the latest five-minute sliding window every minute."


> Does this mean that the rolling counts for the last 9 events are 
> ranked and emitted every 2 seconds? 7 seconds

The RollingCountBolt "thinks" in seconds.  However, behind the scenes
RollingCountBolt uses SlidingWindowCounter [1], which in turn is built
upon SlotBasedCounter [2].  Both the SlidingWindowCounter and the
SlotBasedCounter don't know anything about time or durations (no
seconds, minutes, and such).  This is by design, as it decouples the
responsibility of counting (SlidingWindowCounter/SlotBasedCounter)
from the responsibility of tracking the time (RollingCountBolt).

The Apache Spark project has exactly the same notion of
emitFrequencyInSeconds and windowLengthInSeconds, which they call
slideInterval and windowLength.  See
https://spark.apache.org/docs/0.9.0/streaming-programming-guide.html.
 They also have a similar diagram to what I showed in [3] that
explains the idea behind sliding windows, see section "Window
Operations" in the Spark link above.


Does that make sense?
Michael



[1]
https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlidingWindowCounter.java
[2]
https://github.com/apache/incubator-storm/blob/master/examples/storm-starter/src/jvm/storm/starter/tools/SlotBasedCounter.java
[3]
http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/


On 01.04.2014 18:45, Software Dev wrote:
> In the article
> (http://www.michael-noll.com/blog/2013/01/18/implementing-real-time-trending-topics-in-storm/)
>
> 
and I was wondering what the rationale was for the emit frequencies
> and how they all relate to each other.
> 
> In the example the RollingCountBolt emits every 3 seconds, 
> IntermediateRankingBolt every 2 seconds and TotalRankingBolt every
> 2 seconds. Does this mean that the rolling counts for the last 9
> events are ranked and emitted every 2 seconds? 7 seconds? A little
> confused.
> 
> Thanks
> 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (MingW32)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iEYEARECAAYFAlM7A2kACgkQeW5XuG18ujR93wCdHE6Ldu01fRgnMqjIi7chVMbu
uEMAnjUyrZQq0xkg2REUzbgvk31A85Dm
=YI7Y
-----END PGP SIGNATURE-----