You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@flink.apache.org by Dan Hill <qu...@gmail.com> on 2020/12/31 05:53:57 UTC

Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

Hi!

I'm using Flink SQL to do an interval join.  Rows in one of the tables are
not unique.  I'm fine using either the first or last row.  When I try to
deduplicate
<https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/queries.html#deduplication>
and
then interval join, I get the following error.

IntervalJoin doesn't support consuming update and delete changes which is
produced by node Rank(strategy=[UndefinedStrategy], rankType=[ROW_NUMBER],
rankRange=[rankStart=1, rankEnd=1], partitionBy=[log_user_id], orderBy=[ts
ASC], select=[platform_id, user_id, log_user_id, client_log_ts,
event_api_ts, ts])

Is there a way to combine these in this order?  I could do the
deduplication afterwards but this will result in more state.

- Dan

Re: Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

Posted by Dan Hill <qu...@gmail.com>.

Hey, sorry for the late reply.  I'm using v1.11.1.

Cool.  I did a non-SQL way of using the first row.  I'll try to see if I
can do this in the SQL version.

On Wed, Jan 13, 2021 at 11:26 PM Jark Wu <im...@gmail.com> wrote:

> Hi Dan,
>
> Sorry for the late reply.
>
> I guess you applied a "deduplication with keeping last row" before the
> interval join?
> That will produce an updating stream and interval join only supports
> append-only input.
> You can try to apply "deduplication with keeping *first* row" before the
> interval join.
> That should produce an append-only stream and interval join can consume
> from it.
>
> Best,
> Jark
>
>
>
> On Tue, 5 Jan 2021 at 20:07, Arvid Heise <ar...@ververica.com> wrote:
>
>> Hi Dan,
>>
>> Which Flink version are you using? I know that there has been quite a bit
>> of optimization of deduplication in 1.12, which would reduce the required
>> state tremendously.
>> I'm pulling in Jark who knows more.
>>
>> On Thu, Dec 31, 2020 at 6:54 AM Dan Hill <qu...@gmail.com> wrote:
>>
>>> Hi!
>>>
>>> I'm using Flink SQL to do an interval join.  Rows in one of the tables
>>> are not unique.  I'm fine using either the first or last row.  When I try
>>> to deduplicate
>>> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/queries.html#deduplication> and
>>> then interval join, I get the following error.
>>>
>>> IntervalJoin doesn't support consuming update and delete changes which
>>> is produced by node Rank(strategy=[UndefinedStrategy],
>>> rankType=[ROW_NUMBER], rankRange=[rankStart=1, rankEnd=1],
>>> partitionBy=[log_user_id], orderBy=[ts ASC], select=[platform_id, user_id,
>>> log_user_id, client_log_ts, event_api_ts, ts])
>>>
>>> Is there a way to combine these in this order?  I could do the
>>> deduplication afterwards but this will result in more state.
>>>
>>> - Dan
>>>
>>
>>
>> --
>>
>> Arvid Heise | Senior Java Developer
>>
>> <https://www.ververica.com/>
>>
>> Follow us @VervericaData
>>
>> --
>>
>> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Ververica GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>> (Toni) Cheng
>>
>

Re: Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

Posted by Jark Wu <im...@gmail.com>.

Hi Dan,

Sorry for the late reply.

I guess you applied a "deduplication with keeping last row" before the
interval join?
That will produce an updating stream and interval join only supports
append-only input.
You can try to apply "deduplication with keeping *first* row" before the
interval join.
That should produce an append-only stream and interval join can consume
from it.

Best,
Jark



On Tue, 5 Jan 2021 at 20:07, Arvid Heise <ar...@ververica.com> wrote:

> Hi Dan,
>
> Which Flink version are you using? I know that there has been quite a bit
> of optimization of deduplication in 1.12, which would reduce the required
> state tremendously.
> I'm pulling in Jark who knows more.
>
> On Thu, Dec 31, 2020 at 6:54 AM Dan Hill <qu...@gmail.com> wrote:
>
>> Hi!
>>
>> I'm using Flink SQL to do an interval join.  Rows in one of the tables
>> are not unique.  I'm fine using either the first or last row.  When I try
>> to deduplicate
>> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/queries.html#deduplication> and
>> then interval join, I get the following error.
>>
>> IntervalJoin doesn't support consuming update and delete changes which is
>> produced by node Rank(strategy=[UndefinedStrategy], rankType=[ROW_NUMBER],
>> rankRange=[rankStart=1, rankEnd=1], partitionBy=[log_user_id], orderBy=[ts
>> ASC], select=[platform_id, user_id, log_user_id, client_log_ts,
>> event_api_ts, ts])
>>
>> Is there a way to combine these in this order?  I could do the
>> deduplication afterwards but this will result in more state.
>>
>> - Dan
>>
>
>
> --
>
> Arvid Heise | Senior Java Developer
>
> <https://www.ververica.com/>
>
> Follow us @VervericaData
>
> --
>
> Join Flink Forward <https://flink-forward.org/> - The Apache Flink
> Conference
>
> Stream Processing | Event Driven | Real Time
>
> --
>
> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>
> --
> Ververica GmbH
> Registered at Amtsgericht Charlottenburg: HRB 158244 B
> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
> (Toni) Cheng
>

Re: Flink SQL - IntervalJoin doesn't support consuming update and delete - trying to deduplicate rows

Posted by Arvid Heise <ar...@ververica.com>.

Hi Dan,

Which Flink version are you using? I know that there has been quite a bit
of optimization of deduplication in 1.12, which would reduce the required
state tremendously.
I'm pulling in Jark who knows more.

On Thu, Dec 31, 2020 at 6:54 AM Dan Hill <qu...@gmail.com> wrote:

> Hi!
>
> I'm using Flink SQL to do an interval join.  Rows in one of the tables are
> not unique.  I'm fine using either the first or last row.  When I try to
> deduplicate
> <https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/sql/queries.html#deduplication> and
> then interval join, I get the following error.
>
> IntervalJoin doesn't support consuming update and delete changes which is
> produced by node Rank(strategy=[UndefinedStrategy], rankType=[ROW_NUMBER],
> rankRange=[rankStart=1, rankEnd=1], partitionBy=[log_user_id], orderBy=[ts
> ASC], select=[platform_id, user_id, log_user_id, client_log_ts,
> event_api_ts, ts])
>
> Is there a way to combine these in this order?  I could do the
> deduplication afterwards but this will result in more state.
>
> - Dan
>


-- 

Arvid Heise | Senior Java Developer

<https://www.ververica.com/>

Follow us @VervericaData

--

Join Flink Forward <https://flink-forward.org/> - The Apache Flink
Conference

Stream Processing | Event Driven | Real Time

--

Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany

--
Ververica GmbH
Registered at Amtsgericht Charlottenburg: HRB 158244 B
Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
(Toni) Cheng