You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by wuchong <gi...@git.apache.org> on 2017/01/20 03:26:43 UTC

[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...

Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3175
  
    Hi @hongyuhong , thank your for your job. But it seems that you misunderstand the SQL OVER syntax. 
    The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. It is similar to Row-Window proposed in [FLIP-11](https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations), but is different with Sliding Row-count window.
    
    For example, OVER (ROWS 2 PRECEDING) means that the window of rows that the function operates on is three rows in size, starting with 2 rows preceding until and including the current row.
    
    Say we have a table `T1` 
    
    ```
    t  a  
    -----
    1  1 
    2  5 
    3  3 
    4  5 
    5  4 
    6 11
    ```
    
    and the following SQL will yield:
    
    ```sql
    SELECT t, a, sum(a) OVER (ROWS 2 PRECEDING) FROM T1
    ```
    
    ```
    t  a  avg
    ----------
    1  1  1
    2  5  6
    3  3  9
    4  5  13
    5  4  12
    6 11  20
    ```
    
    For Row-window, we would need something more complex, especially when we need to order by timestamp. For example, to support event-time count-window row-window, we need to create a custom operator that collects records in a priority queue ordered by timestamp. Once a watermark is received for the upper bound of a window, the priority queue is used to evaluate the window function (based on count) and to purge too old records. 
    
    I would suggest this PR to wait for FLINK-4679. When FLINK-4679 is fixed, this PR can be easily supported IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---