You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tanel Kiis (Jira)" <ji...@apache.org> on 2020/09/21 20:24:00 UTC
[jira] [Commented] (SPARK-32928) Non-deterministic expressions
should not be reordered inside AND and OR
[ https://issues.apache.org/jira/browse/SPARK-32928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17199646#comment-17199646 ]
Tanel Kiis commented on SPARK-32928:
------------------------------------
One more point, where this can manifest is FilterExec reordering isNotNull predicates
{code:title=Test SQL file}
-- Test window operator with codegen on and off.
--CONFIG_DIM1 spark.sql.codegen.wholeStage=true
--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY
--CONFIG_DIM1 spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=NO_CODEGEN
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
AS testData(a);
SELECT a FROM testData WHERE NOT ISNULL(IF(RAND(0) > 0.5, NULL, a)) AND RAND(1) > 0.5;
{code}
{code:title=Generated output file}
- Automatically generated by SQLQueryTestSuite
-- Number of queries: 2
-- !query
CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
AS testData(a)
-- !query schema
struct<>
-- !query output
-- !query
SELECT a FROM testData WHERE NOT ISNULL(IF(RAND(0) > 0.5, NULL, a)) AND RAND(1) > 0.5
-- !query schema
struct<a:int>
-- !query output
3
4
8
{code}
{code:title=Error on running the test}
23:16:44.013 ERROR org.apache.spark.sql.SQLQueryTestSuite: Error using configs: spark.sql.codegen.wholeStage=false,spark.sql.codegen.factoryMode=CODEGEN_ONLY
[info] - deterministic.sql *** FAILED *** (1 second, 955 milliseconds)
[info] deterministic.sql
[info] Expected "3
[info] 4
[info] 8[]", but got "3
[info] 4
[info] 8[
[info] 9]" Result did not match for query #1
{code}
> Non-deterministic expressions should not be reordered inside AND and OR
> -----------------------------------------------------------------------
>
> Key: SPARK-32928
> URL: https://issues.apache.org/jira/browse/SPARK-32928
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.1.0
> Reporter: Tanel Kiis
> Priority: Major
>
> Using the splitDisjunctivePredicates and splitConjunctivePredicates helper methods can change the number of times a non-deterministic expression is executed. This can cause correctness issues on the client side.
> An existing test in the FilterPushdownSuite seems to exhibit this problem
> {code}
> test("generate: non-deterministic predicate referenced no generated column") {
> val originalQuery = {
> testRelationWithArrayType
> .generate(Explode('c_arr), alias = Some("arr"))
> .where(('b >= 5) && ('a + Rand(10).as("rnd") > 6) && ('col > 6))
> }
> val optimized = Optimize.execute(originalQuery.analyze)
> val correctAnswer = {
> testRelationWithArrayType
> .where('b >= 5)
> .generate(Explode('c_arr), alias = Some("arr"))
> .where('a + Rand(10).as("rnd") > 6 && 'col > 6)
> .analyze
> }
> comparePlans(optimized, correctAnswer)
> }
> {code}
> In the optimized plan, the deterministic filter is moved ahead of the non-deterministic one:
> {code}
> Filter ((6 < none#0) AND (cast(6 as double) < (rand(10) + cast(none#0 as double))))
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org