You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by hongyuhong <gi...@git.apache.org> on 2017/01/20 02:23:57 UTC

[GitHub] flink pull request #3175: [FLINK-5584]support sliding-count row-window on st...

GitHub user hongyuhong opened a pull request:

    https://github.com/apache/flink/pull/3175

    [FLINK-5584]support sliding-count row-window on streaming sql

    Calcite has already support sliding-count row-window, the grammar look like:
    select sum(amount) over (rows 10 preceding) from Order;
    select sum(amount) over (partition by user rows 10 preceding) from Order;
    And it will parse the sql as a LogicalWindow relnode, the logical Window contains aggregate func info and window info, it's similar to Flink LogicalWIndowAggregate, so we can add an convert rule to directly convert LogicalWindow into DataStreamAggregate relnode.
    
    1. Add HepPlanner to do the window optimize, cause valcano planner can not choose the ProjectToWindow optimize as the best.
    2. Add DataStreamWindowRule.scala to convert LogicalWindow to DataStreamAggregate.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hongyuhong/flink master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/3175.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3175
    
----
commit 2417e24ad676474df3c9fce6701024ca88c6a439
Author: hongyuhong 00223286 <ho...@huawei.com>
Date:   2017-01-20T02:02:12Z

    support sliding-count row-window on streaming sql

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3175
  
    Hi @hongyuhong , thank your for your job. But it seems that you misunderstand the SQL OVER syntax. 
    The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. It is similar to Row-Window proposed in [FLIP-11](https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations), but is different with Sliding Row-count window.
    
    For example, OVER (ROWS 2 PRECEDING) means that the window of rows that the function operates on is three rows in size, starting with 2 rows preceding until and including the current row.
    
    Say we have a table `T1` 
    
    ```
    t  a  
    -----
    1  1 
    2  5 
    3  3 
    4  5 
    5  4 
    6 11
    ```
    
    and the following SQL will yield:
    
    ```sql
    SELECT t, a, sum(a) OVER (ROWS 2 PRECEDING) FROM T1
    ```
    
    ```
    t  a  avg
    ----------
    1  1  1
    2  5  6
    3  3  9
    4  5  13
    5  4  12
    6 11  20
    ```
    
    For Row-window, we would need something more complex, especially when we need to order by timestamp. For example, to support event-time count-window row-window, we need to create a custom operator that collects records in a priority queue ordered by timestamp. Once a watermark is received for the upper bound of a window, the priority queue is used to evaluate the window function (based on count) and to purge too old records. 
    
    I would suggest this PR to wait for FLINK-4679. When FLINK-4679 is fixed, this PR can be easily supported IMO.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...

Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on the issue:

    https://github.com/apache/flink/pull/3175
  
    Hi, @hongyuhong , thank your for your job.  Agree with @wuchong 's comments. I add same database OVER example for you:
    ```
    Example data:
    select * from  PeopleInfo
    ID         Name                 Gender                 Score
    6	LiHuan        	Man         	80
    7	LiHuan        	Man         	90
    8	LiMing        	Man         	56
    9	LiMing        	Woman         	60
    10	WangHua        	Woman         	80
      
    ```
    ```
     --Simple case 
    SELECT name, gender, count(name) OVER () AS num FROM PeopleInfo
    
    name                gender                 num
    LiHuan        	Man         	5
    LiHuan        	Man         	5
    LiMing        	Man         	5
    LiMing        	Woman         	5
    WangHua        Woman         	5
    
    ```
    ```
      --With ORDER BY case
    SELECT name,gender,score ROW_NUMBER() OVER (ORDER BY score ASC) AS num FROM PeopleInfo
    
    name                gender                   score   num
    LiMing        	Man         	56	1
    LiMing        	Woman         	60	2
    WangHua        	Woman         	80	3
    LiHuan        	Man         	80	4
    LiHuan        	Man         	90	5
    ```
    	  
       ```
    --With both PARTITION BY and  ORDER BY case
    SELECT [name],gender,score, ROW_NUMBER() OVER(PARTITION BY  Gender ORDER BY score ASC) as num
    FROM PeopleInfo;
    
    name                gender                   score   num
    LiMing        	Man         	56	1
    LiHuan        	Man         	80	2
    LiHuan        	Man         	90	3
    LiMing        	Woman         	60	1
    WangHua        	Woman         	80	2
    ```
      ```
     --With ROWS  PRECEDING and CURRENT ROW case
    SELECT name, gender, score, sum(score) OVER (PARTITION BY gender ORDER BY
       id ASC  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as  sum
    FROM PeopleInfo
    
    name                 gender                 score     sum
    LiHuan        	Man         	80	80
    LiHuan        	Man         	90	170
    LiMing        	Man         	56	226
    LiMing        	Woman         	60	60
    WangHua        	Woman         	80	140
    
    SELECT name, gender, score, sum(score) OVER (PARTITION BY Gender ORDER BY
       id ASC  ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) as sum
    FROM PeopleInfo
    
    name                 gender                score     sum
    LiHuan        	Man         	80	80
    LiHuan        	Man         	90	170
    LiMing        	Man         	56	146
    LiMing        	Woman         	60	60
    WangHua        	Woman         	80	140
    
    ```                
      ```
    --With ROWS FOLLOWING case   
    
    SELECT id, name, gender, score, sum(score) OVER (PARTITION BY Gender ORDER BY
       id ASC  ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) as sum
    FROM dbo.PeopleInfo
    
    id           name                 gender                score      sum
    6	LiHuan        	Man         	80	170
    7	LiHuan        	Man         	90	226
    8	LiMing        	Man         	56	146
    9	LiMing       	Woman         	60	140
    10	WangHua        	Woman         	80	140
    
    SELECT id,name, gender, score, sum(score) OVER (PARTITION BY gender ORDER BY
     id ASC  ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING ) as sum
    FROM PeopleInfo
     
    id           name                 gender                 score      sum
    6	LiHuan        	Man         	80	170
    7	LiHuan        	Man         	90	226
    8	LiMing        	Man         	56	226
    9	LiMing        	Woman         	60	140
    10	WangHua        	Woman         	80	140
    
    SELECT id, name, gender, score,sum(score) OVER (PARTITION BY gender ORDER BY
       id ASC  ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING ) as sum
    FROM PeopleInfo
    
    id           name                 gender                  score      sum
    8	LiMing        	Man         	56	146
    7	LiHuan        	Man         	90	226
    6	LiHuan        	Man         	80	226
    10	WangHua        	Woman         	80	140
    9	LiMing        	Woman         	60	140
    ```
    Thank you , SunJincheng.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...

Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/3175
  
    Hi @hongyuhong , don't worry about that. Very welcome to contribute to Flink. And please feel free to contact us if you have any question! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...

Posted by hongyuhong <gi...@git.apache.org>.
Github user hongyuhong commented on the issue:

    https://github.com/apache/flink/pull/3175
  
    Hi, @wuchong, @sunjincheng121 ,
    Thank you very much for correcting my mistake!I will study deeply and modify the code. The pr temporarily only consider the processing-time and preceding section, and just do a translate between sql and logicalplan, cause currently  i can't find a windowoperator to describe the following section and event-time. I agree with that if row-window has been implement on tableapi, the streaming sql can support easily.
    Thank you again for your advice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request #3175: [FLINK-5584]support sliding-count row-window on st...

Posted by hongyuhong <gi...@git.apache.org>.
Github user hongyuhong closed the pull request at:

    https://github.com/apache/flink/pull/3175


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---