You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@storm.apache.org by "Stig Rohde Døssing (JIRA)" <ji...@apache.org> on 2017/06/01 17:46:04 UTC

[jira] [Commented] (STORM-2540) Get rid of window compaction in WindowManager

    [ https://issues.apache.org/jira/browse/STORM-2540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16033374#comment-16033374 ] 

Stig Rohde Døssing commented on STORM-2540:
-------------------------------------------

I've changed my mind. The compaction mechanism is likely bad in most cases, but I noticed that the Trident windowing API allows the user to combine any type of TriggerPolicy with any type of EvictionPolicy. I was going by the assumption that policies were always "paired" to be the same type.

I don't want to work on this, because I don't have enough experience with the windowing API to know if mixing policies makes sense, and in which cases compaction may be necessary.

> Get rid of window compaction in WindowManager
> ---------------------------------------------
>
>                 Key: STORM-2540
>                 URL: https://issues.apache.org/jira/browse/STORM-2540
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-client
>    Affects Versions: 2.0.0
>            Reporter: Stig Rohde Døssing
>            Assignee: Stig Rohde Døssing
>
> Storm's windowing support uses trigger and eviction policies to control the size of the windows passed to WindowingBolts. The WindowManager has a hard coded limit of 100 tuples before tuples will start getting evicted from the window, probably as an attempt to avoid overly huge windows when using time based eviction policies. Whenever a tuple is added to the window, the hard cap is checked, and if the number of tuples in the window exceeds the cap the WindowManager evaluates the EvictionPolicy for the tuples to figure out if some can be removed.
> This hard cap is ineffective in most configurations, and has a surprising interaction with the count based policy.
> If the windowing bolt is configured to use timestamp fields in the tuples to determine the current time, the WatermarkingXPolicy classes are used. In this configuration, the compaction isn't doing anything because tuples cannot be evicted until the WatermarkGenerator sends a new watermark, and when it does the TriggerPolicy causes the WindowManager to evict any expired tuples anyway.
> If the windowing bolt is using the count based policy, compaction has the unexpected effect of hard capping the user's configured max count to 100. If the configured count is less than 100, the compaction again has no effect.
> When the bolt is configured to use the tuple arrival time based policy, the compaction only has an effect if there are tuples older than the configured window duration, which only happens if the window happens to trigger slightly late. This can cause tuples to be evicted from the window before the user's bolt sees them. Even when tuples are evicted with the compaction mechanism they are kept in memory until the next time a window is presented to the user's bolt.
> I think the compaction mechanism should be removed. The only policy that benefits is the time based policy, and in that case it would be better to just add a configurable max tuple count to that policy. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)