You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by hongyuhong <gi...@git.apache.org> on 2017/01/20 02:23:57 UTC
[GitHub] flink pull request #3175: [FLINK-5584]support sliding-count row-window on st...
GitHub user hongyuhong opened a pull request:
https://github.com/apache/flink/pull/3175
[FLINK-5584]support sliding-count row-window on streaming sql
Calcite has already support sliding-count row-window, the grammar look like:
select sum(amount) over (rows 10 preceding) from Order;
select sum(amount) over (partition by user rows 10 preceding) from Order;
And it will parse the sql as a LogicalWindow relnode, the logical Window contains aggregate func info and window info, it's similar to Flink LogicalWIndowAggregate, so we can add an convert rule to directly convert LogicalWindow into DataStreamAggregate relnode.
1. Add HepPlanner to do the window optimize, cause valcano planner can not choose the ProjectToWindow optimize as the best.
2. Add DataStreamWindowRule.scala to convert LogicalWindow to DataStreamAggregate.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/hongyuhong/flink master
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/flink/pull/3175.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #3175
----
commit 2417e24ad676474df3c9fce6701024ca88c6a439
Author: hongyuhong 00223286 <ho...@huawei.com>
Date: 2017-01-20T02:02:12Z
support sliding-count row-window on streaming sql
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...
Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:
https://github.com/apache/flink/pull/3175
Hi @hongyuhong , thank your for your job. But it seems that you misunderstand the SQL OVER syntax.
The OVER clause defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. It is similar to Row-Window proposed in [FLIP-11](https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations), but is different with Sliding Row-count window.
For example, OVER (ROWS 2 PRECEDING) means that the window of rows that the function operates on is three rows in size, starting with 2 rows preceding until and including the current row.
Say we have a table `T1`
```
t a
-----
1 1
2 5
3 3
4 5
5 4
6 11
```
and the following SQL will yield:
```sql
SELECT t, a, sum(a) OVER (ROWS 2 PRECEDING) FROM T1
```
```
t a avg
----------
1 1 1
2 5 6
3 3 9
4 5 13
5 4 12
6 11 20
```
For Row-window, we would need something more complex, especially when we need to order by timestamp. For example, to support event-time count-window row-window, we need to create a custom operator that collects records in a priority queue ordered by timestamp. Once a watermark is received for the upper bound of a window, the priority queue is used to evaluate the window function (based on count) and to purge too old records.
I would suggest this PR to wait for FLINK-4679. When FLINK-4679 is fixed, this PR can be easily supported IMO.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...
Posted by sunjincheng121 <gi...@git.apache.org>.
Github user sunjincheng121 commented on the issue:
https://github.com/apache/flink/pull/3175
Hi, @hongyuhong , thank your for your job. Agree with @wuchong 's comments. I add same database OVER example for you:
```
Example data:
select * from PeopleInfo
ID Name Gender Score
6 LiHuan Man 80
7 LiHuan Man 90
8 LiMing Man 56
9 LiMing Woman 60
10 WangHua Woman 80
```
```
--Simple case
SELECT name, gender, count(name) OVER () AS num FROM PeopleInfo
name gender num
LiHuan Man 5
LiHuan Man 5
LiMing Man 5
LiMing Woman 5
WangHua Woman 5
```
```
--With ORDER BY case
SELECT name,gender,score ROW_NUMBER() OVER (ORDER BY score ASC) AS num FROM PeopleInfo
name gender score num
LiMing Man 56 1
LiMing Woman 60 2
WangHua Woman 80 3
LiHuan Man 80 4
LiHuan Man 90 5
```
```
--With both PARTITION BY and ORDER BY case
SELECT [name],gender,score, ROW_NUMBER() OVER(PARTITION BY Gender ORDER BY score ASC) as num
FROM PeopleInfo;
name gender score num
LiMing Man 56 1
LiHuan Man 80 2
LiHuan Man 90 3
LiMing Woman 60 1
WangHua Woman 80 2
```
```
--With ROWS PRECEDING and CURRENT ROW case
SELECT name, gender, score, sum(score) OVER (PARTITION BY gender ORDER BY
id ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) as sum
FROM PeopleInfo
name gender score sum
LiHuan Man 80 80
LiHuan Man 90 170
LiMing Man 56 226
LiMing Woman 60 60
WangHua Woman 80 140
SELECT name, gender, score, sum(score) OVER (PARTITION BY Gender ORDER BY
id ASC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW ) as sum
FROM PeopleInfo
name gender score sum
LiHuan Man 80 80
LiHuan Man 90 170
LiMing Man 56 146
LiMing Woman 60 60
WangHua Woman 80 140
```
```
--With ROWS FOLLOWING case
SELECT id, name, gender, score, sum(score) OVER (PARTITION BY Gender ORDER BY
id ASC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING ) as sum
FROM dbo.PeopleInfo
id name gender score sum
6 LiHuan Man 80 170
7 LiHuan Man 90 226
8 LiMing Man 56 146
9 LiMing Woman 60 140
10 WangHua Woman 80 140
SELECT id,name, gender, score, sum(score) OVER (PARTITION BY gender ORDER BY
id ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING ) as sum
FROM PeopleInfo
id name gender score sum
6 LiHuan Man 80 170
7 LiHuan Man 90 226
8 LiMing Man 56 226
9 LiMing Woman 60 140
10 WangHua Woman 80 140
SELECT id, name, gender, score,sum(score) OVER (PARTITION BY gender ORDER BY
id ASC ROWS BETWEEN 1 PRECEDING AND UNBOUNDED FOLLOWING ) as sum
FROM PeopleInfo
id name gender score sum
8 LiMing Man 56 146
7 LiHuan Man 90 226
6 LiHuan Man 80 226
10 WangHua Woman 80 140
9 LiMing Woman 60 140
```
Thank you , SunJincheng.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...
Posted by wuchong <gi...@git.apache.org>.
Github user wuchong commented on the issue:
https://github.com/apache/flink/pull/3175
Hi @hongyuhong , don't worry about that. Very welcome to contribute to Flink. And please feel free to contact us if you have any question!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] flink issue #3175: [FLINK-5584]support sliding-count row-window on streaming...
Posted by hongyuhong <gi...@git.apache.org>.
Github user hongyuhong commented on the issue:
https://github.com/apache/flink/pull/3175
Hi, @wuchong, @sunjincheng121 ,
Thank you very much for correcting my mistake!I will study deeply and modify the code. The pr temporarily only consider the processing-time and preceding section, and just do a translate between sql and logicalplan, cause currently i can't find a windowoperator to describe the following section and event-time. I agree with that if row-window has been implement on tableapi, the streaming sql can support easily.
Thank you again for your advice.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
[GitHub] flink pull request #3175: [FLINK-5584]support sliding-count row-window on st...
Posted by hongyuhong <gi...@git.apache.org>.
Github user hongyuhong closed the pull request at:
https://github.com/apache/flink/pull/3175
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---