You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by "Singh, Sandeep" <sa...@verisign.com> on 2019/07/29 23:18:31 UTC

Strange time aggregation behavior exhibited by BaseWindowedBolt

During testing of my topology which uses Storm's Tumbling window, I see strange behavior how my stream of tuples are handled and split into different time windows.

I am using a Tumbling window with duration and lag set to 10 seconds:

                val duration = BaseWindowedBolt.Duration.seconds(10)

                myBolt.withTumblingWindow(duration).withTimestampField("timestampField").withLag(duration)



When I send four tuples with timestamp set to same value "now - 1 second" (where now = System.currentTimeMillis()), I see log messages that storm is able to extract the time information from tuples. However bolt's "execute(inputWindow: TupleWindow)" method never gets invoked. In my test I wait for 2 minutes. I do not see any log message about late tuples.



When I send five tuples,  the first four with timestamp  "now - 1 second" and last one with "now + 1 hour", I see Storm is able to extract all the five tuples.  However the execute(inputWindow: TupleWindow) method is either invoked

  a) only once with first four tuple (the behavior I expected)  or,

  b) twice, first invocation with tuple 1 & 2, second invocation with tuple 3 & 4. Since all the four tuples have exactly same timestamp, I don't understand why tuples are partitined in different time windows.

Also the bolt's execute method never get's invoked with 5th tuple. However, sending 5th tuple (which is well outside the time duration window of 10 seconds) ensure that execute method is called once or twice for the first four tuples.