You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Danny Chan <yu...@gmail.com> on 2020/08/27 06:39:47 UTC

[Survey] Demand collection for stream SQL window join

Hi, users, here i want to collect some use cases about the window join[1], which is a supported feature on the data stream. The purpose is to make a decision whether to support it also on the SQL side, for example, 2 tumbling window join may look like this:

```sql
select ... window_start, window_end
from TABLE(
  TUMBLE(
    DATA => TABLE table_a,
    TIMECOL => DESCRIPTOR(rowtime),
    SIZE => INTERVAL '1' MINUTE)) tumble_a
    [LEFT | RIGHT | FULL OUTER] JOIN TABLE(
  TUMBLE(
    DATA => TABLE table_b,
    TIMECOL => DESCRIPTOR(rowtime),
    SIZE => INTERVAL '1' MINUTE)) tumble_b
on tumble_a.col1 = tumble_b.col1 and ...
```

I had some discussion off-line with some companies (Tencent, Bytedance and Meituan), and it seems that interval join is the most common case. The window join case is very few, so i'm looking forward there are some feed-back here.

Expecially, it is apprecaited if you can share the use cases of the window join (using the Flink data stream or written by other programs) and why the window-join is a must(can not replace with normal stream join or interval join).

Thanks in advance ~

[1] https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/joining.html

Best,
Danny Chan

Re: [Survey] Demand collection for stream SQL window join

Posted by Jark Wu <im...@gmail.com>.
Thanks for the survey!

I'm also interested on the use cases of DataStream window join.

Best,
Jark

On Thu, 27 Aug 2020 at 14:40, Danny Chan <yu...@gmail.com> wrote:

> Hi, users, here i want to collect some use cases about the window join[1],
> which is a supported feature on the data stream. The purpose is to make a
> decision whether to support it also on the SQL side, for example, 2
> tumbling window join may look like this:
>
> ```sql
> select ... window_start, window_end
> from TABLE(
>   TUMBLE(
>     DATA => TABLE table_a,
>     TIMECOL => DESCRIPTOR(rowtime),
>     SIZE => INTERVAL '1' MINUTE)) tumble_a
>     [LEFT | RIGHT | FULL OUTER] JOIN TABLE(
>   TUMBLE(
>     DATA => TABLE table_b,
>     TIMECOL => DESCRIPTOR(rowtime),
>     SIZE => INTERVAL '1' MINUTE)) tumble_b
> on tumble_a.col1 = tumble_b.col1 and ...
> ```
>
> I had some discussion off-line with some companies (Tencent, Bytedance and
> Meituan), and it seems that interval join is the most common case. The
> window join case is very few, so i'm looking forward there are some
> feed-back here.
>
> Expecially, it is apprecaited if you can share the use cases of the window
> join (using the Flink data stream or written by other programs) and why the
> window-join is a must(can not replace with normal stream join or interval
> join).
>
> Thanks in advance ~
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/joining.html
>
> Best,
> Danny Chan
>