You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by henryr <gi...@git.apache.org> on 2018/02/28 00:52:25 UTC

[GitHub] spark pull request #20687: [SPARK-25000][SQL] Fix complex type simplificatio...

GitHub user henryr opened a pull request:

    https://github.com/apache/spark/pull/20687

    [SPARK-25000][SQL] Fix complex type simplification rules to apply to entire plan

    ## What changes were proposed in this pull request?
    
    Complex type simplification optimizer rules were not applied to the
    entire plan, just the expressions reachable from the root node. This
    patch fixes the rules to transform the entire plan.
    
    ## How was this patch tested?
    
    New unit test + ran sql / core tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/henryr/spark spark-25000

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20687.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20687
    
----
commit f446fa2482f2ccdfa6fec7243bb25b2e232da5fa
Author: Henry Robinson <he...@...>
Date:   2018-02-28T00:42:17Z

    [SPARK-25000][SQL] Fix complex type simplification rules to apply to entire plan
    
    ## What changes were proposed in this pull request?
    
    Complex type simplification optimizer rules were not applied to the
    entire plan, just the expressions reachable from the root node. This
    patch fixes the rules to transform the entire plan.
    
    ## How was this patch tested?
    
    New unit test + sql / core tests.

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r175149719
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -19,57 +19,47 @@ package org.apache.spark.sql.catalyst.optimizer
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.catalyst.plans.logical.Aggregate
    --- End diff --
    
    > import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, LogicalPlan}


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174974033
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -19,57 +19,47 @@ package org.apache.spark.sql.catalyst.optimizer
     
     import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
    +import org.apache.spark.sql.catalyst.plans.logical.Aggregate
    --- End diff --
    
    This should be before line 21 in alphabetical order.
    You can check this locally with `dev/scalastyle`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    cc @dongjoon-hyun Do you want to review this PR? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #87763 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87763/testReport)** for PR 20687 at commit [`f446fa2`](https://github.com/apache/spark/commit/f446fa2482f2ccdfa6fec7243bb25b2e232da5fa).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174620713
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    Since `map` is not orderable, that happens for `struct` and `array` types.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173323846
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    Done, thanks. I filed SPARK-23634 to fix this. Out of interest, why does `AttributeReference` cache the nullability of its referent? Is it because comparison is too expensive to do if you have to follow a level of indirection to get to the original attribute? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r172985682
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,30 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    // If nullable attributes aren't used, the array and map test cases fail because array
    +    // and map indexing can return null so the output is marked nullable.
    --- End diff --
    
    why? I think the optimization is still valid, we should show this in the test, instead of hiding it with a nullable attribute.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r172992262
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,30 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    // If nullable attributes aren't used, the array and map test cases fail because array
    +    // and map indexing can return null so the output is marked nullable.
    --- End diff --
    
    The optimization works either way, but in (for example) the map case, `m1` is marked as nullable in the original plan because presumably `GetMapValue(CreateMap(...))` can return `null` if the key is not in the map. 
    
    So for the expected plan to compare the same as the original, it has to be reading a nullable attribute - otherwise the plans don't pass `comparePlans`.  I moved and reworded the comment to hopefully clarify this a bit.
    
    There's an opportunity to fix this up again after the rule completes (since some attributes could be marked too conservatively as nullable). Do you think that's something we should pursue for this PR?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1542/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    @henryr Thanks for your great work!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174663562
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,37 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // These tests must use nullable attributes from the base relation for the following reason:
    +    // in the 'original' plans below, the Aggregate node produced by groupBy() has a
    +    // nullable AttributeReference to a1, because both array indexing and map lookup are
    +    // nullable expressions. After optimization, the same attribute is now non-nullable,
    +    // but the AttributeReference is not updated to reflect this. In the 'expected' plans,
    +    // the grouping expressions have the same nullability as the original attribute in the
    +    // relation. If that attribute is non-nullable, the tests will fail as the plans will
    +    // compare differently, so for these tests we must use a nullable attribute. See
    +    // SPARK-23634.
    +    val arrayRel = relation
    +      .select(GetArrayItem(CreateArray(Seq('nullable_id, 'nullable_id + 1L)), 0) as "a1")
    +      .groupBy($"a1")("1").analyze
    +    val arrayExpected = relation.select('nullable_id as "a1").groupBy($"a1")("1").analyze
    +    comparePlans(Optimizer execute arrayRel, arrayExpected)
    +
    +    val mapRel = relation
    +      .select(GetMapValue(CreateMap(Seq("id", 'nullable_id)), "id") as "m1")
    +      .groupBy($"m1")("1").analyze
    +    val mapExpected = relation
    +      .select('nullable_id as "m1")
    +      .groupBy($"m1")("1").analyze
    +    comparePlans(Optimizer execute mapRel, mapExpected)
    --- End diff --
    
    @henryr .
    Could you add more test cases mentioned today, for example, like the following?
    ```
        val structRel = relation.groupBy(
          CreateNamedStruct(Seq("att1", 'nullable_id)))(
          GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None)).analyze
        comparePlans(Optimizer execute structRel, structRel)
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88035 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88035/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173012266
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    `nullable` is mostly calculated on demand, so we don't have rules to change the `nullable` property. For this case, the expression is `Alias(GetArrayItem(CreateArray(Attribute...)))`, which is nullable. After optimize, it becomes `Alias(Attribute...)` and is not nullable(if that attribute is not nullable). So the `nullable` is updated automatically.
    
    I don't know why you hit this issue, please ping us if you can't figure it out, we can help to debug.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173519701
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
    --- End diff --
    
    nit. Could you fix the indentation?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173520165
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    +      // Remove redundant field extraction.
           case GetStructField(createNamedStructLike: CreateNamedStructLike, ordinal, _) =>
             createNamedStructLike.valExprs(ordinal)
    -    }
    -  }
    -}
     
    -/**
    -* push down operations into [[CreateArray]].
    -*/
    -object SimplifyCreateArrayOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field selection (array of structs)
    +      // Remove redundant array indexing.
           case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) =>
    --- End diff --
    
    nit.
    ```
    case GetArrayStructFields(CreateArray(elems), field, ordinal, _, _) =>
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1619/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1435/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174606237
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    Expression-level optimizer simplifies both `aggregateExpressions` and `groupingExpressions` together. If the target expression exists at somewhere of both sides, the simplified expression also exists at the same locations of both sides. Given that, `semanticEquals` will work for the updated expressions.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88024/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    This failing because of SPARK-23606, which seems unrelated (I haven't been able to trigger it in local builds, at least). 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173667671
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    Sorry for late response, @gatorsmile .
    These are expression-level optimization rules. If the original expressions exists in `SELECT`, `GROUP BY`, and `HAVING`, those are simplified in the same way together. Do you have any concerning cases?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-25000][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r171328612
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -25,8 +25,8 @@ import org.apache.spark.sql.catalyst.rules.Rule
     * push down operations into [[CreateNamedStructLike]].
     */
     object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    --- End diff --
    
    +1 for @cloud-fan 's advice.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173007679
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    It's a good question! I'm not too familiar with how nullability is marked and unmarked during planning. My understanding is roughly that the analyzer resolves all the plan's expressions and in doing so marks attributes as nullable or not. After that it's not clear that the optimizer revisits any of those nullability decisions. Is there an optimizer pass which should make nullability marking more precise? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    I didn't retrigger Jenkins due to the existing comment.
    Overall, the PR looks reasonable to me. I look forward to see the follow-up issue, SPARK-23634.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173331455
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    Because `AttributeReference` is not only used as a reference of an attribute from children, but also the new attributes produced by leaf nodes, which has to carry the nullable info. It's not ideal but it's too late to change now.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88280/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-25000][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r172300096
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +331,24 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'id)), 0, None) as "foo")
    +      .select('foo).analyze
    --- End diff --
    
    Thanks for the pointer. I replaced the projection with an aggregation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1121/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88035 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88035/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88056/testReport)** for PR 20687 at commit [`f66112c`](https://github.com/apache/spark/commit/f66112c05898611b4efc21c69a1a1c987a537df2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-25000][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r171328520
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +331,24 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'id)), 0, None) as "foo")
    +      .select('foo).analyze
    --- End diff --
    
    @henryr Could you update the test cases properly? Actually, this will not provide the test coverage of your PR properly because of `CollapseProject` at line 40.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173006092
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    This explains why the original plan(before optimize) marks its output as nullable, but I'm confused why the optimized plan still marks its output as nullable.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    The fix looks good to me, but the test coverage is not enough. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88024/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173013094
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    Thanks, that's plenty of information to get started - I'll dig into it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-25000][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r171174781
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -25,8 +25,8 @@ import org.apache.spark.sql.catalyst.rules.Rule
     * push down operations into [[CreateNamedStructLike]].
     */
     object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    --- End diff --
    
    nit: can we merge these 3 rules? then we only need to transform the plan once.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88056/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173583729
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    @dongjoon-hyun , is it safe to simplify it for Aggregate? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88135 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88135/testReport)** for PR 20687 at commit [`c2137d7`](https://github.com/apache/spark/commit/c2137d78ab474f2ccbcc944eb3654d9b1bdbdcef).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1344/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87739/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88391/testReport)** for PR 20687 at commit [`5926301`](https://github.com/apache/spark/commit/592630148af19adbb72703dd1ff49f82c33087d2).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88280 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88280/testReport)** for PR 20687 at commit [`8adaa47`](https://github.com/apache/spark/commit/8adaa4748b7b0d7990b45e58d82c7d9f2248923f).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174606778
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    how about `select struct(a, b).a from t group by struct(a, b)`? We may optimize it to `select a from t group by struct(a, b)`, which is invalid.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r172984333
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
     object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    --- End diff --
    
    `SimplifyExtractValueOps`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173561415
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #87739 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87739/testReport)** for PR 20687 at commit [`f446fa2`](https://github.com/apache/spark/commit/f446fa2482f2ccdfa6fec7243bb25b2e232da5fa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #87971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87971/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173268999
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    @cloud-fan I looked again at this briefly this morning. The issue is that it's the `AttributeReference` in the top-level `Aggregate`'s `groupingExpressions` that has inconsistent nullability. 
    
    The `AttributeReference` in the original plan was originally created with `nullable=true`, before optimization. So at that point it's kind of fixed unless the optimizer dereferences the attr reference and realises that the target is no longer nullable. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88391/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174637789
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    FWIW I think this is a particular example of a more general problem where expression simplification can break the correspondence between a select expression and its grouping equivalent.
    
    Here's a simpler example:
    
    `SELECT (a + b) - a FROM t GROUP BY a + b`
    
    gets me the following:
    
    `org.apache.spark.sql.AnalysisException: expression 't.`b`' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.`
    
    Postgres also has this problem, at least in 9.3: `ERROR: column "t.a" must appear in the GROUP BY clause or be used in an aggregate function Position: 23`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88135/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173270388
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,31 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // If nullable attributes aren't used in the 'expected' plans, the array and map test
    +    // cases fail because array and map indexing can return null so the output attribute
    --- End diff --
    
    good catch! Let's explain this in the test and fix it in a follow-up. We can just add a new rule to transform the plan and update the `nullability`.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    @henryr Please try to add the test cases that matter in your opinion. I will also submit a follow-up PR to add more test cases after this PR is merged.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174974450
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/complexTypesSuite.scala ---
    @@ -331,4 +330,47 @@ class ComplexTypesSuite extends PlanTest with ExpressionEvalHelper {
           .analyze
         comparePlans(Optimizer execute rel, expected)
       }
    +
    +  test("SPARK-23500: Simplify complex ops that aren't at the plan root") {
    +    val structRel = relation
    +      .select(GetStructField(CreateNamedStruct(Seq("att1", 'nullable_id)), 0, None) as "foo")
    +      .groupBy($"foo")("1").analyze
    +    val structExpected = relation
    +      .select('nullable_id as "foo")
    +      .groupBy($"foo")("1").analyze
    +    comparePlans(Optimizer execute structRel, structExpected)
    +
    +    // These tests must use nullable attributes from the base relation for the following reason:
    +    // in the 'original' plans below, the Aggregate node produced by groupBy() has a
    +    // nullable AttributeReference to a1, because both array indexing and map lookup are
    +    // nullable expressions. After optimization, the same attribute is now non-nullable,
    +    // but the AttributeReference is not updated to reflect this. In the 'expected' plans,
    +    // the grouping expressions have the same nullability as the original attribute in the
    +    // relation. If that attribute is non-nullable, the tests will fail as the plans will
    +    // compare differently, so for these tests we must use a nullable attribute. See
    +    // SPARK-23634.
    +    val arrayRel = relation
    +      .select(GetArrayItem(CreateArray(Seq('nullable_id, 'nullable_id + 1L)), 0) as "a1")
    +      .groupBy($"a1")("1").analyze
    +    val arrayExpected = relation.select('nullable_id as "a1").groupBy($"a1")("1").analyze
    +    comparePlans(Optimizer execute arrayRel, arrayExpected)
    +
    +    val mapRel = relation
    +      .select(GetMapValue(CreateMap(Seq("id", 'nullable_id)), "id") as "m1")
    +      .groupBy($"m1")("1").analyze
    +    val mapExpected = relation
    +      .select('nullable_id as "m1")
    +      .groupBy($"m1")("1").analyze
    +    comparePlans(Optimizer execute mapRel, mapExpected)
    +
    --- End diff --
    
    It seems that the current test case become too long. For the following negative cases, let's split to another test case. Maybe, with the following title?
    ```
    test("SPARK-23500: Aggregation expressions should not be simplified.")
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87971/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    @gatorsmile thank you for the reviews! Are there specific test cases you'd like to see? I've checked correlated and uncorrelated subqueries, various flavours of join, aggregates with HAVING clauses, nested compound types, and so on. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1362/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88024/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173520459
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    +      // Remove redundant field extraction.
           case GetStructField(createNamedStructLike: CreateNamedStructLike, ordinal, _) =>
             createNamedStructLike.valExprs(ordinal)
    -    }
    -  }
    -}
     
    -/**
    -* push down operations into [[CreateArray]].
    -*/
    -object SimplifyCreateArrayOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field selection (array of structs)
    +      // Remove redundant array indexing.
           case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) =>
             // instead f selecting the field on the entire array,
             // select it from each member of the array.
             // pushing down the operation this way open other optimizations opportunities
             // (i.e. struct(...,x,...).x)
             CreateArray(elems.map(GetStructField(_, ordinal, Some(field.name))))
    -      // push down item selection.
    +
    +      // Remove redundant map lookup.
           case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
             // instead of creating the array and then selecting one row,
             // remove array creation altgether.
    --- End diff --
    
    `altgether` -> `altogether`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88135 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88135/testReport)** for PR 20687 at commit [`c2137d7`](https://github.com/apache/spark/commit/c2137d78ab474f2ccbcc944eb3654d9b1bdbdcef).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Thanks! Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174613887
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    Oh, right. I missed to consider that kind of cases.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88280 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88280/testReport)** for PR 20687 at commit [`8adaa47`](https://github.com/apache/spark/commit/8adaa4748b7b0d7990b45e58d82c7d9f2248923f).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88056/testReport)** for PR 20687 at commit [`f66112c`](https://github.com/apache/spark/commit/f66112c05898611b4efc21c69a1a1c987a537df2).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r172996211
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
     object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    --- End diff --
    
    Good point, done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #87739 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87739/testReport)** for PR 20687 at commit [`f446fa2`](https://github.com/apache/spark/commit/f446fa2482f2ccdfa6fec7243bb25b2e232da5fa).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #87971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87971/testReport)** for PR 20687 at commit [`63c7098`](https://github.com/apache/spark/commit/63c7098fc4b14af7859580682f17c73abcd7ff08).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173561557
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    +      // Remove redundant field extraction.
           case GetStructField(createNamedStructLike: CreateNamedStructLike, ordinal, _) =>
             createNamedStructLike.valExprs(ordinal)
    -    }
    -  }
    -}
     
    -/**
    -* push down operations into [[CreateArray]].
    -*/
    -object SimplifyCreateArrayOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field selection (array of structs)
    +      // Remove redundant array indexing.
           case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) =>
             // instead f selecting the field on the entire array,
             // select it from each member of the array.
             // pushing down the operation this way open other optimizations opportunities
             // (i.e. struct(...,x,...).x)
             CreateArray(elems.map(GetStructField(_, ordinal, Some(field.name))))
    -      // push down item selection.
    +
    +      // Remove redundant map lookup.
           case ga @ GetArrayItem(CreateArray(elems), IntegerLiteral(idx)) =>
             // instead of creating the array and then selecting one row,
             // remove array creation altgether.
    --- End diff --
    
    Done.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88035/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    @dongjoon-hyun Have you finished the review?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1331/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Will submit a separate PR for tests only. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173707199
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    --- End diff --
    
    aggregateExpressions are resolved from groupingExpressions using semanticEquals, while referring to names from input.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    @gatorsmile ok, I think the coverage right now is a reasonable start - the other test cases I can think of would act more like they're exercising the expression-walking code, not the actual simplification. Look forward to collaborating on the follow-up PR. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1286/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87763/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r174664343
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,54 +22,34 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    -*/
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    + * Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
    + */
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    --- End diff --
    
    @henryr . You can change like the following in order to avoid `Aggregate`.
    ```scala
      override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
        case a: Aggregate => a
        case p => p.transformExpressionsUp {
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-23500][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #88391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88391/testReport)** for PR 20687 at commit [`5926301`](https://github.com/apache/spark/commit/592630148af19adbb72703dd1ff49f82c33087d2).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1137/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by henryr <gi...@git.apache.org>.
Github user henryr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20687#discussion_r173561516
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ComplexTypes.scala ---
    @@ -22,32 +22,24 @@ import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan
     import org.apache.spark.sql.catalyst.rules.Rule
     
     /**
    -* push down operations into [[CreateNamedStructLike]].
    +* Simplify redundant [[CreateNamedStructLike]], [[CreateArray]] and [[CreateMap]] expressions.
     */
    -object SimplifyCreateStructOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field extraction
    +object SimplifyExtractValueOps extends Rule[LogicalPlan] {
    +  override def apply(plan: LogicalPlan): LogicalPlan = plan transform { case p =>
    +    p.transformExpressionsUp {
    +      // Remove redundant field extraction.
           case GetStructField(createNamedStructLike: CreateNamedStructLike, ordinal, _) =>
             createNamedStructLike.valExprs(ordinal)
    -    }
    -  }
    -}
     
    -/**
    -* push down operations into [[CreateArray]].
    -*/
    -object SimplifyCreateArrayOps extends Rule[LogicalPlan] {
    -  override def apply(plan: LogicalPlan): LogicalPlan = {
    -    plan.transformExpressionsUp {
    -      // push down field selection (array of structs)
    +      // Remove redundant array indexing.
           case GetArrayStructFields(CreateArray(elems), field, ordinal, numFields, containsNull) =>
    --- End diff --
    
    Done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    **[Test build #87763 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87763/testReport)** for PR 20687 at commit [`f446fa2`](https://github.com/apache/spark/commit/f446fa2482f2ccdfa6fec7243bb25b2e232da5fa).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20687: [SPARK-25000][SQL] Fix complex type simplification rules...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20687
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20687: [SPARK-23500][SQL] Fix complex type simplificatio...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20687


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org