You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tanel Kiis (Jira)" <ji...@apache.org> on 2020/09/17 20:11:00 UTC

[jira] [Updated] (SPARK-32928) Non-deterministic expressions should not be reordered inside AND and OR

     [ https://issues.apache.org/jira/browse/SPARK-32928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tanel Kiis updated SPARK-32928:
-------------------------------
    Description: 
Using the splitDisjunctivePredicates and splitConjunctivePredicates helper methods can change the number of times a non-deterministic expression is executed. This can cause correctness issues on the client side.

An existing test in the FilterPushdownSuite seems to exhibit this problem
{code}
test("generate: non-deterministic predicate referenced no generated column") {
    val originalQuery = {
      testRelationWithArrayType
        .generate(Explode('c_arr), alias = Some("arr"))
        .where(('b >= 5) && ('a + Rand(10).as("rnd") > 6) && ('col > 6))
    }
    val optimized = Optimize.execute(originalQuery.analyze)
    val correctAnswer = {
      testRelationWithArrayType
        .where('b >= 5)
        .generate(Explode('c_arr), alias = Some("arr"))
        .where('a + Rand(10).as("rnd") > 6 && 'col > 6)
        .analyze
    }

    comparePlans(optimized, correctAnswer)
  }
{code}

In the optimized plan, the deterministic filter is moved ahead of the non-deterministic one:
{code}
Filter ((6 < none#0) AND (cast(6 as double) < (rand(10) + cast(none#0 as double))))
{code}

  was:Using the splitDisjunctivePredicates and splitConjunctivePredicates helper methods can change the number of times a non-deterministic expression is executed. This can cause correctness issues on the client side.


> Non-deterministic expressions should not be reordered inside AND and OR
> -----------------------------------------------------------------------
>
>                 Key: SPARK-32928
>                 URL: https://issues.apache.org/jira/browse/SPARK-32928
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Tanel Kiis
>            Priority: Major
>
> Using the splitDisjunctivePredicates and splitConjunctivePredicates helper methods can change the number of times a non-deterministic expression is executed. This can cause correctness issues on the client side.
> An existing test in the FilterPushdownSuite seems to exhibit this problem
> {code}
> test("generate: non-deterministic predicate referenced no generated column") {
>     val originalQuery = {
>       testRelationWithArrayType
>         .generate(Explode('c_arr), alias = Some("arr"))
>         .where(('b >= 5) && ('a + Rand(10).as("rnd") > 6) && ('col > 6))
>     }
>     val optimized = Optimize.execute(originalQuery.analyze)
>     val correctAnswer = {
>       testRelationWithArrayType
>         .where('b >= 5)
>         .generate(Explode('c_arr), alias = Some("arr"))
>         .where('a + Rand(10).as("rnd") > 6 && 'col > 6)
>         .analyze
>     }
>     comparePlans(optimized, correctAnswer)
>   }
> {code}
> In the optimized plan, the deterministic filter is moved ahead of the non-deterministic one:
> {code}
> Filter ((6 < none#0) AND (cast(6 as double) < (rand(10) + cast(none#0 as double))))
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org