You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/07/03 12:43:00 UTC

[jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates

    [ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16072367#comment-16072367 ] 

ASF GitHub Bot commented on FLINK-6969:
---------------------------------------

Github user fhueske commented on the issue:

    https://github.com/apache/flink/pull/4183
  
    Thanks for your thoughts @sunjincheng121 and @wuchong.
    I thought about this again and agree with you. We should have a separate parameters to specify watermark adjustments and early firings.
    
    I propose the following:
    
    1. We add a parameter `lateDataTimeOffset` which adjusts the watermarks at the source (actually at all sources of a query) by injecting a custom operator. The parameter can be positive or negative and adjusts the watermarks. I think the name is good because the watermarks control the lateness of records. Also, Table API / SQL users should not need to know about the concept of watermarks. **This is done as part of this issue / PR.**
    
    2. We add a parameter `earlyResultTimeOffset` which defines the time when the first early result (e.g., of a windowed aggregate) is computed. The parameter should be negative, i.e., a value of `-30.mins` results in early results which start 30 minute before the watermark reaches the end of a window. 
    
    3. The same `updateRate` parameter as in [FLINK-6649](https://issues.apache.org/jira/browse/FLINK-6649) / PR #4157 is used to control how often early results are updated. I don't think we need a special parameters for early result or late data updates.
    
    **2. and 3. are addressed in a separate issue.**
    
    What do you think?


> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid updates of previous results. Instead of computing a result as soon as it is possible (i.e., when a corresponding watermark was received), deferred computation adds a configurable amount of slack time in which late data is accepted before the result is compute. For example, instead of computing a tumbling window of 1 hour at each full hour, we can add a deferred computation interval of 15 minute to compute the result quarter past each full hour.
> This approach adds latency but can reduce the number of update esp. in use cases where the user cannot influence the generation of watermarks. It is also useful if the data is emitted to a system that cannot update result (files or Kafka). The deferred computation interval should be configured via the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)