You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by mgaido91 <gi...@git.apache.org> on 2018/10/13 17:25:32 UTC

[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in order...

GitHub user mgaido91 opened a pull request:

    https://github.com/apache/spark/pull/22713

    [SPARK-25691][SQL] Use semantic equality in order to compare attributes

    ## What changes were proposed in this pull request?
    
    When we compare attributes, in general, we should always refer to semantic equality, as the default `equal` method can return false when there are "cosmetic" differences between them, but still they are the same thing; at least we have to consider them so when analyzing/optimizing queries.
    
    The PR focuses on the usage and comparison of the `output` of a `LogicalPlan`, which is a `Seq[Attribute]`. In this case, using equality implicitly fails to check the semantic equality. In analyzer and optimizer, the occurrences of such a comparison have been replaced with proper comparison using semantic equality.
    
    ## How was this patch tested?
    
    running the tests with the patch provided by @maryannxue in https://github.com/apache/spark/pull/22060

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mgaido91/spark SPARK-25691

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22713.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22713
    
----
commit 87fc62ab80324552ab6811aa05b7d897402a5464
Author: Marco Gaido <ma...@...>
Date:   2018-10-13T17:17:18Z

    [SPARK-25691][SQL] Use semantic equality in order to compare attributes

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Could we just hold this PR until we can ensure the expression ID are unique in the whole tree? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    @cloud-fan I would answer yes. At least, I haven't found a real case when this can cause a bug visible to end users, but I am not 100% that there is no weird case where someone could hit it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4032/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3947/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Thanks @gatorsmile, I agree. Then I reduced the scope to only the analyzer rules and added the UT for the only rule changed (other than the `AliasViewChild` one which is tested with the code provided by @maryannxue).
    
    I removed the changes in the `Analyzer`, because those cases are hard to test, in the sense that I have not found operators causing problems for them, as they rely on working on operators which bring the very same `output` (eg. in one case the very same plan is used, in the other all the operators I have seen just do `def output = child.output`, so the output is really the same). So despite I think the changes there should be done, I am ok leaving them out of scope here as they are of no harm.
    
    Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r225115756
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -112,8 +112,8 @@ object EliminateView extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         // The child should have the same output attributes with the View operator, so we simply
         // remove the View operator.
    -    case View(_, output, child) =>
    -      assert(output == child.output,
    +    case v @ View(_, output, child) =>
    +      assert(v.sameOutput(child),
    --- End diff --
    
    Is it possible to come out a test case showing previous comparing is problematic?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97450 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97450/testReport)** for PR 22713 at commit [`0d334e3`](https://github.com/apache/spark/commit/0d334e33dcbbfbbf3c69cd0c26b5ce497a77675c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97380/testReport)** for PR 22713 at commit [`c4aaa8d`](https://github.com/apache/spark/commit/c4aaa8d07042eebb3a3425011fdef913230364a6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r225119359
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -112,8 +112,8 @@ object EliminateView extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         // The child should have the same output attributes with the View operator, so we simply
         // remove the View operator.
    -    case View(_, output, child) =>
    -      assert(output == child.output,
    +    case v @ View(_, output, child) =>
    +      assert(v.sameOutput(child),
    --- End diff --
    
    +1 for adding a test case. BTW does it impact end-users? If it does we need to backport it to 2.4.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #98268 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98268/testReport)** for PR 22713 at commit [`0e49e91`](https://github.com/apache/spark/commit/0e49e91acd6a9ab18993fb9c0ea34ddc200cc902).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3962/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97344/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    I do not want to take any risk unless you can have a test case to show the benefit of each change. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97361/testReport)** for PR 22713 at commit [`64aafc5`](https://github.com/apache/spark/commit/64aafc5fa08d995f508065d3d33e9b29bb4da6c5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3960/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    @gatorsmile thanks for your comment. Honestly I don't see much value in holding, as we are (nearly always) using semantic equality throughout the Analyzer/Optimizer. So different expressions with the same ID are already causing troubles. Here we are changing/introducing no news about that. Usually, instead, not using semantic equality results in a bug (that's also the reason why we have `AttributeSet` for instance).
    
    I agree on enforcing that exprIds are unique on the plan, otherwise we do can have issues, but we would already have regardless of this PR.
    
    cc also @cloud-fan for his thoughts.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97361/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Optim...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r225025827
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAliasAndProjectSuite.scala ---
    @@ -124,4 +124,11 @@ class RemoveRedundantAliasAndProjectSuite extends PlanTest with PredicateHelper
         val expected = Subquery(relation.select('a as "a", 'b).where('b < 10).select('a).analyze)
         comparePlans(optimized, expected)
       }
    +
    +  test("SPARK-25691: RemoveRedundantProject works also with different cases") {
    +    val relation = LocalRelation('a.int, 'b.int)
    +    val query = relation.select('A, 'b).analyzeCaseInsensitive
    +    val optimized = Optimize.execute(query)
    +    comparePlans(optimized, relation)
    --- End diff --
    
    I agree that using `==` on attributes is error-prone, but we should update then one-by-one, to narrow down the scope and make sure the change is reasonable.
    
    For instance, I don't think this is a valid case. If we optimize it, the final schema field names will change, which is a breaking change if this plan an input of a parquet writing plan. (the result parquet files will have a different schema)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r229313621
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
    @@ -604,4 +606,28 @@ class AnalysisSuite extends AnalysisTest with Matchers {
           checkAnalysis(input, expected)
         }
       }
    +
    +  test("SPARK-25691: AliasViewChild with different nullabilities") {
    +    object ViewAnalyzer extends RuleExecutor[LogicalPlan] {
    +      val batches = Batch("View", Once, AliasViewChild(conf), EliminateView) :: Nil
    +    }
    +    def intNotNullableAttr(name: String): Attribute = {
    --- End diff --
    
    nit: since this method is called with the same name. maybe we don't need it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97450/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4627/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in Optimizer in...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97361/testReport)** for PR 22713 at commit [`64aafc5`](https://github.com/apache/spark/commit/64aafc5fa08d995f508065d3d33e9b29bb4da6c5).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `sealed class BaseSimpleAnalyzer(caseSensitive: Boolean) extends Analyzer(`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r229307114
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/analysis/AnalysisSuite.scala ---
    @@ -604,4 +606,28 @@ class AnalysisSuite extends AnalysisTest with Matchers {
           checkAnalysis(input, expected)
         }
       }
    +
    +  test("SPARK-25691: AliasViewChild with different nullabilities") {
    +    object ViewAnalyzer extends RuleExecutor[LogicalPlan] {
    +      val batches = Batch("View", Once, AliasViewChild(conf), EliminateView) :: Nil
    +    }
    +    def intNotNullableAttr(name: String): Attribute = {
    +      AttributeReference(name, IntegerType, nullable = false)()
    +    }
    +    val relation = LocalRelation(intNotNullableAttr("a"), 'b.string)
    --- End diff --
    
    nit: `'a.int.notNull`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97344 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97344/testReport)** for PR 22713 at commit [`87fc62a`](https://github.com/apache/spark/commit/87fc62ab80324552ab6811aa05b7d897402a5464).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    LGTM. To confirm, this is a potential bug, currently end-users can't hit it, right?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r225486497
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -112,8 +112,8 @@ object EliminateView extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         // The child should have the same output attributes with the View operator, so we simply
         // remove the View operator.
    -    case View(_, output, child) =>
    -      assert(output == child.output,
    +    case v @ View(_, output, child) =>
    +      assert(v.sameOutput(child),
    --- End diff --
    
    I added the check and I found a case which may be considered as a bug (not sure honestly, it is a weird situation which I think might occur, but it a bad condition, which we may want to handle differently).
    
    Currently the rule doesn't work well when the output of the view and the output of its child differs because of some nullable. You can find an example in the UT I added, where the view has all the output attributes as nullable, while the child has one as not-nullable. In this case, we are currently failing with an exception in the optimizer rule `EliminateView`. After the change, the plan is correctly created.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97450 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97450/testReport)** for PR 22713 at commit [`0d334e3`](https://github.com/apache/spark/commit/0d334e33dcbbfbbf3c69cd0c26b5ce497a77675c).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97380/testReport)** for PR 22713 at commit [`c4aaa8d`](https://github.com/apache/spark/commit/c4aaa8d07042eebb3a3425011fdef913230364a6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r225123064
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/view.scala ---
    @@ -112,8 +112,8 @@ object EliminateView extends Rule[LogicalPlan] {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         // The child should have the same output attributes with the View operator, so we simply
         // remove the View operator.
    -    case View(_, output, child) =>
    -      assert(output == child.output,
    +    case v @ View(_, output, child) =>
    +      assert(v.sameOutput(child),
    --- End diff --
    
    I am not sure if we can test this in a way different from running @maryannxue's checks. I'll try to find one in the next days. As of now, I have no evidence that this impacts end-users. If I'll find out a case, I'll notify you. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    anymore comments on this @cloud-fan @maryannxue @viirya ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #98268 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98268/testReport)** for PR 22713 at commit [`0e49e91`](https://github.com/apache/spark/commit/0e49e91acd6a9ab18993fb9c0ea34ddc200cc902).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by maryannxue <gi...@git.apache.org>.
Github user maryannxue commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    We do need a test case here anyway. Ideally it would be just as simple as #22701 but the difficulty is in declaring a view.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r225103000
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAliasAndProjectSuite.scala ---
    @@ -124,4 +124,11 @@ class RemoveRedundantAliasAndProjectSuite extends PlanTest with PredicateHelper
         val expected = Subquery(relation.select('a as "a", 'b).where('b < 10).select('a).analyze)
         comparePlans(optimized, expected)
       }
    +
    +  test("SPARK-25691: RemoveRedundantProject works also with different cases") {
    +    val relation = LocalRelation('a.int, 'b.int)
    +    val query = relation.select('A, 'b).analyzeCaseInsensitive
    +    val optimized = Optimize.execute(query)
    +    comparePlans(optimized, relation)
    --- End diff --
    
    Spark can be case-sensitive or not w.r.t. the config, but Spark should always be case-preserving.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98268/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in order to com...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    **[Test build #97344 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/97344/testReport)** for PR 22713 at commit [`87fc62a`](https://github.com/apache/spark/commit/87fc62ab80324552ab6811aa05b7d897402a5464).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/3973/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22713: [SPARK-25691][SQL] Use semantic equality in AliasViewChi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22713
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/97380/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Alias...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22713


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22713: [SPARK-25691][SQL] Use semantic equality in Optim...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22713#discussion_r225091990
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/RemoveRedundantAliasAndProjectSuite.scala ---
    @@ -124,4 +124,11 @@ class RemoveRedundantAliasAndProjectSuite extends PlanTest with PredicateHelper
         val expected = Subquery(relation.select('a as "a", 'b).where('b < 10).select('a).analyze)
         comparePlans(optimized, expected)
       }
    +
    +  test("SPARK-25691: RemoveRedundantProject works also with different cases") {
    +    val relation = LocalRelation('a.int, 'b.int)
    +    val query = relation.select('A, 'b).analyzeCaseInsensitive
    +    val optimized = Optimize.execute(query)
    +    comparePlans(optimized, relation)
    --- End diff --
    
    thanks for you comment. Then let me focus for this only to the view topic, we can open other tickets for each change later.
    
    > For instance, I don't think this is a valid case.
    
    I see the concern about the possible breaking change, so I agree about not introducing this. My point is: then we are saying that Spark is never really case-insensitive, even though the case sensitive option is turned to false, isn't it? Shouldn't datasources write/read columns in a non-case-sensitive way when this flag is turned on?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org