You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Jark Wu (Jira)" <ji...@apache.org> on 2020/08/10 09:03:00 UTC

[jira] [Commented] (FLINK-18871) Non-deterministic function could break retract mechanism

    [ https://issues.apache.org/jira/browse/FLINK-18871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17174196#comment-17174196 ] 

Jark Wu commented on FLINK-18871:
---------------------------------

That's true. This is a known issue. Users should pay attention when using non-deterministic function between aggregations (or other operators produce changelog). A solution might be to support Stateful Calc operator, that emit the retractions with previous value instead of generate new values from the non-deterministic functions. 

> Non-deterministic function could break retract mechanism
> --------------------------------------------------------
>
>                 Key: FLINK-18871
>                 URL: https://issues.apache.org/jira/browse/FLINK-18871
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API, Table SQL / Runtime
>            Reporter: Benchao Li
>            Priority: Major
>
> For example, we have a following SQL:
> {code:sql}
> create view view1 as
> select 
>   max(a) as m1,
>   max(b) as m2 -- b is a timestmap
> from T
> group by c, d;
> create view view2 as
> select * from view1
> where m2 > CURRENT_TIMESTAMP;
> insert into MySink
> select sum(m1) as m1 
> from view2
> group by c;
> {code}
> view1 will produce retract messages, and the same message in view2 maybe produce different results. and the second agg will produce wrong result.
>  For example,
> {noformat}
> view1:
> + (1, 2020-8-10 16:13:00)
> - (1, 2020-8-10 16:13:00)
> + (2, 2020-8-10 16:13:10)
> view2:
> + (1, 2020-8-10 16:13:00)
> - (1, 2020-8-10 16:13:00) // this record may be filtered out
> + (2, 2020-8-10 16:13:10)
> MySink:
> + (1, 2020-8-10 16:13:00)
> + (2, 2020-8-10 16:13:10) // will produce wrong result.
> {noformat}
> In general, the non-deterministic function may break the retract mechanism. All operators downstream which will rely on the retraction mechanism will produce wrong results, or throw exception, such as Agg / some Sink which need retract message / TopN / Window.
> (The example above is a simplified version of some production jobs in our scenario, just to explain the core problem)
> CC [~ykt836] [~jark]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)