You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Lu Weizheng <lu...@hotmail.com> on 2020/03/01 09:48:45 UTC
Get Tumbling Window Top-K using SQL
Hi,
I find a question on StackOverflow(https://stackoverflow.com/questions/49191326/flink-stream-sql-order-by) about how to get Top-K using Flink SQL, it was written by Fabian. It was backed in 2018.
The main idea is using a RANK to get the Top K of filed 'a':
SELECT a, b, c
FROM (
SELECT
a, b, c,
RANK() OVER (ORDER BY a PARTITION BY CEIL(t TO MINUTE) BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rank
FROM yourTable)
WHERE rank <= 10
is there better way to get tumbling window Top-K item now?
And the wiki on dynamic table may need to update. https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/streaming/dynamic_tables.html
In the above wiki, I don't know why the query has a field 'lastLogin'
SELECT user, RANK() OVER (ORDER BY lastLogin)
FROM (
SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
);
Thanks!
Re: Get Tumbling Window Top-K using SQL
Posted by Jark Wu <im...@gmail.com>.
Hi Weizheng,
You are right. You can use the TopN feature in blink planner. But note that
it doesn't support tumbling window topn, it is a topn without windowing and
event-time.
But you can achieve it by PARTITIONED BY <time-window>, the <time-window>
column could be a preprocessed column which represents which window does
this row belongs to, e.g. 1-hour windowing: "2020-03-02 10:00", "2020-03-02
11:00".
The tumbling window topn will be natively supported in the future.
Best,
Jark
On Mon, 2 Mar 2020 at 10:55, Lu Weizheng <lu...@hotmail.com> wrote:
> Sorry guys,
>
> I find solution on wiki about Top-N using Blink planner.
>
> SELECT [column_list]FROM (
> SELECT [column_list],
> ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]]
> ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum
> FROM table_name)WHERE rownum <= N [AND conditions]
>
>
> thanks anyway.
> ------------------------------
> *发件人:* Lu Weizheng <lu...@hotmail.com>
> *发送时间:* 2020年3月1日 17:48
> *收件人:* user@flink.apache.org <us...@flink.apache.org>
> *主题:* Get Tumbling Window Top-K using SQL
>
> Hi,
>
> I find a question on StackOverflow(
> https://stackoverflow.com/questions/49191326/flink-stream-sql-order-by)
> about how to get Top-K using Flink SQL, it was written by Fabian. It was
> backed in 2018.
> The main idea is using a RANK to get the Top K of filed 'a':
>
> SELECT a, b, c
> FROM (
> SELECT
> a, b, c,
> RANK() OVER (ORDER BY a PARTITION BY CEIL(t TO MINUTE) BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rank
> FROM yourTable)
> WHERE rank <= 10
>
> is there better way to get tumbling window Top-K item now?
>
> And the wiki on dynamic table may need to update.
> https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/streaming/dynamic_tables.html
>
> In the above wiki, I don't know why the query has a field 'lastLogin'
>
> SELECT user, RANK() OVER (ORDER BY lastLogin)FROM (
> SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user);
>
>
> Thanks!
>
回复: Get Tumbling Window Top-K using SQL
Posted by Lu Weizheng <lu...@hotmail.com>.
Sorry guys,
I find solution on wiki about Top-N using Blink planner.
SELECT [column_list]
FROM (
SELECT [column_list],
ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]]
ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum
FROM table_name)
WHERE rownum <= N [AND conditions]
thanks anyway.
________________________________
发件人: Lu Weizheng <lu...@hotmail.com>
发送时间: 2020年3月1日 17:48
收件人: user@flink.apache.org <us...@flink.apache.org>
主题: Get Tumbling Window Top-K using SQL
Hi,
I find a question on StackOverflow(https://stackoverflow.com/questions/49191326/flink-stream-sql-order-by) about how to get Top-K using Flink SQL, it was written by Fabian. It was backed in 2018.
The main idea is using a RANK to get the Top K of filed 'a':
SELECT a, b, c
FROM (
SELECT
a, b, c,
RANK() OVER (ORDER BY a PARTITION BY CEIL(t TO MINUTE) BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) as rank
FROM yourTable)
WHERE rank <= 10
is there better way to get tumbling window Top-K item now?
And the wiki on dynamic table may need to update. https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/table/streaming/dynamic_tables.html
In the above wiki, I don't know why the query has a field 'lastLogin'
SELECT user, RANK() OVER (ORDER BY lastLogin)
FROM (
SELECT user, MAX(cTime) AS lastAction FROM clicks GROUP BY user
);
Thanks!