You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tanel Kiis (Jira)" <ji...@apache.org> on 2020/09/21 20:24:00 UTC

[jira] [Commented] (SPARK-32928) Non-deterministic expressions should not be reordered inside AND and OR

    [ https://issues.apache.org/jira/browse/SPARK-32928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199646#comment-17199646 ] 

Tanel Kiis commented on SPARK-32928:
------------------------------------

One more point, where this can manifest is FilterExec reordering isNotNull predicates

{code:title=Test SQL file}
-- Test window operator with codegen on and off.
--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY
--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN

CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
AS testData(a);

SELECT a FROM testData WHERE NOT ISNULL(IF(RAND(0) > 0.5, NULL, a)) AND RAND(1) > 0.5;
{code}

{code:title=Generated output file}
- Automatically generated by SQLQueryTestSuite
-- Number of queries: 2


-- !query
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
AS testData(a)
-- !query schema
struct<>
-- !query output



-- !query
SELECT a FROM testData WHERE NOT ISNULL(IF(RAND(0) > 0.5, NULL, a)) AND RAND(1) > 0.5
-- !query schema
struct<a:int>
-- !query output
3
4
8
{code}

{code:title=Error on running the test}
23:16:44.013 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY
[info] - deterministic.sql *** FAILED *** (1 second, 955 milliseconds)
[info]   deterministic.sql
[info]   Expected "3
[info]   4
[info]   8[]", but got "3
[info]   4
[info]   8[
[info]   9]" Result did not match for query #1
{code}

> Non-deterministic expressions should not be reordered inside AND and OR
> -----------------------------------------------------------------------
>
>                 Key: SPARK-32928
>                 URL: https://issues.apache.org/jira/browse/SPARK-32928
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Tanel Kiis
>            Priority: Major
>
> Using the splitDisjunctivePredicates and splitConjunctivePredicates helper methods can change the number of times a non-deterministic expression is executed. This can cause correctness issues on the client side.
> An existing test in the FilterPushdownSuite seems to exhibit this problem
> {code}
> test("generate: non-deterministic predicate referenced no generated column") {
>     val originalQuery = {
>       testRelationWithArrayType
>         .generate(Explode('c_arr), alias = Some("arr"))
>         .where(('b >= 5) && ('a + Rand(10).as("rnd") > 6) && ('col > 6))
>     }
>     val optimized = Optimize.execute(originalQuery.analyze)
>     val correctAnswer = {
>       testRelationWithArrayType
>         .where('b >= 5)
>         .generate(Explode('c_arr), alias = Some("arr"))
>         .where('a + Rand(10).as("rnd") > 6 && 'col > 6)
>         .analyze
>     }
>     comparePlans(optimized, correctAnswer)
>   }
> {code}
> In the optimized plan, the deterministic filter is moved ahead of the non-deterministic one:
> {code}
> Filter ((6 < none#0) AND (cast(6 as double) < (rand(10) + cast(none#0 as double))))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org