You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Stefano Bortoli <st...@huawei.com> on 2017/10/17 15:16:49 UTC
GROUP BY TUMBLE on ROW range
Hi all,
Is there a way to use a tumble window group by with row range in streamSQL?
I mean, something like this:
// "SELECT COUNT(*) " +
// "FROM T1 " +
// "GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )"
However, even looking at tests and looking at the "row interval expression generation" I could not find any examples in SQL. I know it is supported by the stream APIs, and countWindow is the chosen abstraction.
table
.window(Tumble over 2.rows on 'long as 'w)
.groupBy('w)
.select('int.count)
.toDataSet[Row]
I fear I am missing something simple. Thanks a lot for the support guys!
Best,
Stefano
RE: GROUP BY TUMBLE on ROW range
Posted by Stefano Bortoli <st...@huawei.com>.
Great, thanks for the explanation. I noticed now indeed that the examples are for the table API. I believe over window is sufficient for the purpose right now, was just curious.
Best,
Stefano
From: Fabian Hueske [mailto:fhueske@gmail.com]
Sent: Tuesday, October 17, 2017 9:24 PM
To: Stefano Bortoli <st...@huawei.com>
Cc: user@flink.apache.org
Subject: Re: GROUP BY TUMBLE on ROW range
Hi Stefano,
this is not supported in Flink's SQL and we would need new Group Window functions (like TUMBLE) for this.
A TUMBLE_COUNT function would be somewhat similar to SESSION, which also requires checks on the sorted neighboring rows to identify the window of a row.
Such a function would first need to be added to Calcite and then integrated with Flink.
A tumble count could also be expressed in plain SQL but wouldn't be very intuitive. You would have to
- define an over window (maybe partitioned on some key) sorted on time with a ROW_NUMBER function that assigns increasing numbers to rows.
- do a group by on the row number modulo the window size.
Btw. count windows are supported by the Table API.
Best, Fabian
2017-10-17 17:16 GMT+02:00 Stefano Bortoli <st...@huawei.com>>:
Hi all,
Is there a way to use a tumble window group by with row range in streamSQL?
I mean, something like this:
// "SELECT COUNT(*) " +
// "FROM T1 " +
// "GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )"
However, even looking at tests and looking at the “row interval expression generation” I could not find any examples in SQL. I know it is supported by the stream APIs, and countWindow is the chosen abstraction.
table
.window(Tumble over 2.rows on 'long as 'w)
.groupBy('w)
.select('int.count)
.toDataSet[Row]
I fear I am missing something simple. Thanks a lot for the support guys!
Best,
Stefano
Re: GROUP BY TUMBLE on ROW range
Posted by Fabian Hueske <fh...@gmail.com>.
Hi Stefano,
this is not supported in Flink's SQL and we would need new Group Window
functions (like TUMBLE) for this.
A TUMBLE_COUNT function would be somewhat similar to SESSION, which also
requires checks on the sorted neighboring rows to identify the window of a
row.
Such a function would first need to be added to Calcite and then integrated
with Flink.
A tumble count could also be expressed in plain SQL but wouldn't be very
intuitive. You would have to
- define an over window (maybe partitioned on some key) sorted on time with
a ROW_NUMBER function that assigns increasing numbers to rows.
- do a group by on the row number modulo the window size.
Btw. count windows are supported by the Table API.
Best, Fabian
2017-10-17 17:16 GMT+02:00 Stefano Bortoli <st...@huawei.com>:
> Hi all,
>
>
>
> Is there a way to use a tumble window group by with row range in streamSQL?
>
>
>
> I mean, something like this:
>
>
>
> // "SELECT COUNT(*) " +
>
> // "FROM T1 " +
>
> // "GROUP BY TUMBLE(rowtime, INTERVAL '2' ROWS PRECEDING )"
>
>
>
> However, even looking at tests and looking at the “row interval expression
> generation” I could not find any examples in SQL. I know it is supported by
> the stream APIs, and countWindow is the chosen abstraction.
>
>
>
> table
>
> .window(Tumble over 2.rows on 'long as 'w)
>
> .groupBy('w)
>
> .select('int.count)
>
> .toDataSet[Row]
>
>
>
> I fear I am missing something simple. Thanks a lot for the support guys!
>
>
>
> Best,
>
> Stefano
>