You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Julian Hyde (Jira)" <ji...@apache.org> on 2020/02/19 18:13:00 UTC

[jira] [Commented] (CALCITE-3760) Rewriting non-deterministic function can break query semantics

    [ https://issues.apache.org/jira/browse/CALCITE-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17040289#comment-17040289 ] 

Julian Hyde commented on CALCITE-3760:
--------------------------------------

I don't think this PR approaches the problem the right way:
* I don't yet see a need to add {{boolean isDeterministic()}} method to {{org.apache.calcite.schema.Function}}. We already have {{boolean SqlOperator.isDeterministic()}} and we have the annotation {{org.apache.calcite.linq4j.function.Deterministic}} to be applied to Java UDFs.
* Figuring out whether an expression is deterministic by tree-walking into the expression, as {{SqlUtil.isDeterminstic}} does, is the wrong approach. It will work on the original query, but the determinism will get broken during rewrite.
* I think the robust solution is to ensure that operators that are non-deterministic appear only as the topmost expression in a {{Project}}. That way they are computed only once, and any expression referencing them is using a field that has already been computed. Probably it would be up to {{SqlToRelConverter}} to carve up expressions so that non-deterministic operators are on top. And up to {{ProjectMergeRule}} and similar rules to ensure that they stay on top.
* The definition of 'deterministic' is ambiguous. The documentation should state explicitly whether functions that return the same result for the duration of a query (e.g. {{CURRENT_TIMESTAMP}}) or the same result for the current row (e.g. {{NEXT_VALUE}} of a sequence) are considered to be deterministic. Really we need deterministic to be an enum with 4 values, not a boolean. Maybe there's a fifth: a function that must be evaluated even if its result is not used.

> Rewriting non-deterministic function can break query semantics
> --------------------------------------------------------------
>
>                 Key: CALCITE-3760
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3760
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>            Reporter: Jin Xing
>            Assignee: Jin Xing
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Calcite rewrite some *SqlFunctions* during validation. But whether the function is deterministic is not considered. For a non-deterministic operator, the rewriting can break semantics. Additionally there's no interface for user to specify the determinism for a UDF/UDAF. 
> Say I have non-deterministic UDF & UDAF and run sql like below
> {code:java}
> select coalesce(udf(col0), 100) from foo;
> select nullif(udaf(col0), 1024) from foo;{code}
> They will be rewritten as
> {code:java}
> select case when udf(col0) is not null then udf(col0) else 100 end
> from foo;
> select case when udaf(col0)=1024 then null udaf(col0)
> from foo{code}
> As we can see that non-deterministic UDF & UDAF are called multiple times after written. Thus the condition in WHEN clause might NOT be held all the time.
> We need to provide an interface for user to specify the determinism in UDF/UDAF and consider whether a SqlNode is deterministic when rewriting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)