You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by maryannxue <gi...@git.apache.org> on 2018/10/11 16:57:45 UTC

[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

GitHub user maryannxue opened a pull request:

    https://github.com/apache/spark/pull/22701

    [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely

    ## What changes were proposed in this pull request?
    
    The HandleNullInputsForUDF rule can generate new If node infinitely, thus causing problems like match of SQL cache missed.
    This was fixed in SPARK-24891 and was then broken by SPARK-25044.
    The unit test in `AnalysisSuite` added in SPARK-24891 should have failed but didn't because it wasn't properly updated after the `ScalaUDF` constructor signature change. So this PR also updates the test accordingly based on the new `ScalaUDF` constructor.
    
    ## How was this patch tested?
    
    Updated the original UT. This should be justified as the original UT became invalid after SPARK-25044.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/maryannxue/spark spark-25690

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22701.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22701
    
----
commit 736625b3f280dcd6caed83b3cec82b72d4f0c0fc
Author: maryannxue <ma...@...>
Date:   2018-10-11T16:45:28Z

    [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF does not stabilize and can be applied infinitely

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3902/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22701


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    **[Test build #97283 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97283/testReport)** for PR 22701 at commit [`dfa301e`](https://github.com/apache/spark/commit/dfa301ebdf289d6501a8c0edf44e35e76a043c7d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    **[Test build #97283 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97283/testReport)** for PR 22701 at commit [`dfa301e`](https://github.com/apache/spark/commit/dfa301ebdf289d6501a8c0edf44e35e76a043c7d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22701#discussion_r224596662
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2151,7 +2151,7 @@ class Analyzer(
                 // TODO: skip null handling for not-nullable primitive inputs after we can completely
                 // trust the `nullable` information.
                 val inputsNullCheck = nullableTypes.zip(inputs)
    -              .filter { case (nullable, _) => !nullable }
    +              .filter { case (nullable, expr) => !nullable && !expr.isInstanceOf[KnownNotNull] }
    --- End diff --
    
    let us use the original way? in the PR https://github.com/apache/spark/pull/21851
    ```Scala
    val needsNullCheck = ... 
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    **[Test build #97273 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97273/testReport)** for PR 22701 at commit [`736625b`](https://github.com/apache/spark/commit/736625b3f280dcd6caed83b3cec82b72d4f0c0fc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22701#discussion_r224620279
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2150,8 +2150,10 @@ class Analyzer(
     
                 // TODO: skip null handling for not-nullable primitive inputs after we can completely
                 // trust the `nullable` information.
    +            val needsNullCheck = (nullable: Boolean, expr: Expression) =>
    --- End diff --
    
    Should this param be something like `cantBeNull` or something? this receives `!nullableType` as its arg but is called `nullable`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    LGTM
    
    Thanks! Merged to master/2.4


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22701#discussion_r224606888
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
    @@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
       test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
         val a = testRelation.output(0)
         val func = (x: Int, y: Int) => x + y
    -    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
    -    val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
    +    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil)
    --- End diff --
    
    OK, it sounds like we will have another 2.4 RC anyway, so we should get all of these changes in.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22701#discussion_r224658264
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala ---
    @@ -2150,8 +2150,10 @@ class Analyzer(
     
                 // TODO: skip null handling for not-nullable primitive inputs after we can completely
                 // trust the `nullable` information.
    +            val needsNullCheck = (nullable: Boolean, expr: Expression) =>
    --- End diff --
    
    Yes, that's because "nullableType" is flipped around here. "nullableType" should really be "cantBeNull" or "doesntNeedNullCheck". I'll change this in other PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22701#discussion_r224602234
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
    @@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
       test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
         val a = testRelation.output(0)
         val func = (x: Int, y: Int) => x + y
    -    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
    -    val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
    +    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil)
    --- End diff --
    
    It's two separate issues.
    If `nullableTypes` is not added here, the `HandleNullInputsForUDF` will do nothing, which means null checks will be missed. So it is itself a problem, which can be potentially triggered by a user.
    As to test, if the rule is not doing anything, the "doing something infinitely" bug cannot be reproduced. But the infinite issue is one on a theoretical level and is quite unlikely to have any end-user impact, thanks to @rxin's fix for SPARK-24865.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97273/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97283/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3894/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputs...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22701#discussion_r224547656
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
    @@ -351,8 +351,8 @@ class AnalysisSuite extends AnalysisTest with Matchers {
       test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
         val a = testRelation.output(0)
         val func = (x: Int, y: Int) => x + y
    -    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil)
    -    val udf2 = ScalaUDF(func, IntegerType, a :: udf1 :: Nil)
    +    val udf1 = ScalaUDF(func, IntegerType, a :: a :: Nil, nullableTypes = false :: false :: Nil)
    --- End diff --
    
    So clearly we should make both of these changes. This change fixes the test here. But is the change above, involving `KnownNotNull`, important for correctness? that is can some user code trigger this infinite loop you mention in SPARK-24891? I'm trying to figure out whether the change here is absolutely required for 2.4, or an important change that could happen in 2.4.1.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    **[Test build #97273 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97273/testReport)** for PR 22701 at commit [`736625b`](https://github.com/apache/spark/commit/736625b3f280dcd6caed83b3cec82b72d4f0c0fc).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22701: [SPARK-25690][SQL] Analyzer rule HandleNullInputsForUDF ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22701
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org