You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user-zh@flink.apache.org by "casel.chen" <ca...@126.com> on 2021/08/11 09:06:31 UTC
flink sql聚合后collect收集数据问题
源表三个字段 name, color, ts
按时间窗口聚合后想根据name group by取colors数组
create table source_table (
name STRING,
color STRING,
ts TIMESTAMP,
WATERMARK ts for ts
)
create table sink_table (
name STRING,
colors ARRAY<STRING>
)
1. 请问这个select语句要怎么写?
select name, collect(color) as colors from source_table group by tumble(ts, interval '5' seconds)
这里collect(color)返回的是multiset类型,怎样转成Array类型呢?
2. 如果array元素很多,我只想取其中N个,该怎么写flink sql?
3, 若取出现次数最多的前N个,又该怎么写flink sql?
select name, collect(color) as colors from (
select name, color from (
select *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY color_cnt desc) AS row_num from (
select name, color, count(*) as color_cnt group by name, color, tumble(ts, interval '5' seconds)
)
) where row_num < 5
);
是这样写么?
Re: flink sql聚合后collect收集数据问题
Posted by Caizhi Weng <ts...@gmail.com>.
Hi!
1 & 2. multiset 不能转成 array。可以考虑使用 listagg + split_index + limit
语句达成需要的效果。当然更方便的还是写一个 UDF。
3. window top-n 可以使用 1.13 新引入的 window tvf:
https://ci.apache.org/projects/flink/flink-docs-release-1.13/zh/docs/dev/table/sql/queries/window-topn/
casel.chen <ca...@126.com> 于2021年8月11日周三 下午5:06写道:
> 源表三个字段 name, color, ts
> 按时间窗口聚合后想根据name group by取colors数组
>
>
> create table source_table (
> name STRING,
> color STRING,
> ts TIMESTAMP,
> WATERMARK ts for ts
> )
>
>
> create table sink_table (
> name STRING,
> colors ARRAY<STRING>
> )
>
>
> 1. 请问这个select语句要怎么写?
> select name, collect(color) as colors from source_table group by
> tumble(ts, interval '5' seconds)
> 这里collect(color)返回的是multiset类型,怎样转成Array类型呢?
>
>
> 2. 如果array元素很多,我只想取其中N个,该怎么写flink sql?
>
> 3, 若取出现次数最多的前N个,又该怎么写flink sql?
> select name, collect(color) as colors from (
> select name, color from (
> select *, ROW_NUMBER() OVER (PARTITION BY name ORDER BY color_cnt desc)
> AS row_num from (
> select name, color, count(*) as color_cnt group by name, color, tumble(ts,
> interval '5' seconds)
> )
> ) where row_num < 5
> );
> 是这样写么?