You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/07 00:58:41 UTC

[GitHub] [spark] imback82 opened a new pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

imback82 opened a new pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842
 
 
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   Currently, in the following scenario, an unnecessary `Sort` node is introduced:
   ```scala
   withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
     val df = (0 until 20).toDF("i").as("df")
     df.repartition(8, df("i")).write.format("parquet")
       .bucketBy(8, "i").sortBy("i").saveAsTable("t")
     val t1 = spark.table("t")
     val t2 = t1.selectExpr("i as ii")
     t1.join(t2, t1("i") === t2("ii")).explain
   }
   ```
   ```
   == Physical Plan ==
   *(3) SortMergeJoin [i#8], [ii#10], Inner
   :- *(1) Project [i#8]
   :  +- *(1) Filter isnotnull(i#8)
   :     +- *(1) ColumnarToRow
   :        +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8
   +- *(2) Sort [ii#10 ASC NULLS FIRST], false, 0    <==== UNNECESSARY
      +- *(2) Project [i#8 AS ii#10]
         +- *(2) Filter isnotnull(i#8)
            +- *(2) ColumnarToRow
               +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8
   ```
   Notice that `Sort [ii#10 ASC NULLS FIRST], false, 0` is introduced even though the underlying data is already sorted. This is because `outputOrdering` doesn't handle aliases correctly. This PR proposes to fix this issue.
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   To better handle aliases in `outputOrdering`.
   
   ### Does this PR introduce any user-facing change?
   <!--
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If no, write 'No'.
   -->
   Yes, now with the fix, the `explain` prints out the following:
   ```
   == Physical Plan ==
   *(3) SortMergeJoin [i#8], [ii#10], Inner
   :- *(1) Project [i#8]
   :  +- *(1) Filter isnotnull(i#8)
   :     +- *(1) ColumnarToRow
   :        +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8
   +- *(2) Project [i#8 AS ii#10]
      +- *(2) Filter isnotnull(i#8)
         +- *(2) ColumnarToRow
            +- FileScan parquet default.t[i#8] Batched: true, DataFilters: [isnotnull(i#8)], Format: Parquet, Location: InMemoryFileIndex[file:/..., PartitionFilters: [], PushedFilters: [IsNotNull(i)], ReadSchema: struct<i:int>, SelectedBucketsCount: 8 out of 8
   ```
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   Tests added.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389331713
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputs.scala
 ##########
 @@ -16,16 +16,18 @@
  */
 package org.apache.spark.sql.execution
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression, SortOrder}
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning}
 
 /**
- * A trait that handles aliases in the `outputExpressions` to produce `outputPartitioning`
- * that satisfies output distribution requirements.
+ * A trait that handles aliases in the `outputExpressions` and `orderingExpressions` to produce
+ * `outputPartitioning` and `outputOrdering` that satisfy distribution and ordering requirements.
  */
-trait AliasAwareOutputPartitioning extends UnaryExecNode {
+trait AliasAwareOutputs extends UnaryExecNode {
 
 Review comment:
   Yes, good idea.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596938107
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24334/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389984828
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
 ##########
 @@ -604,6 +604,18 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils {
     }
   }
 
+  test("sort should not be introduced when aliases are used") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+      withTable("t") {
+        df1.repartition(1).write.format("parquet").bucketBy(8, "i").sortBy("i").saveAsTable("t")
 
 Review comment:
   I added a test for `SortAggregateExec`. `HashAggregateExec` and `ObjectAggregateExec` are not affected since ordering is not involved with them.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596799294
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-597049062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119605/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596164531
 
 
   cc: @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596877438
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596877211
 
 
   **[Test build #119598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119598/testReport)** for PR 27842 at commit [`fc6bc62`](https://github.com/apache/spark/commit/fc6bc62356ef1a4bd23a69d5a73a5c7c78995c03).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r390103805
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
 ##########
 @@ -975,6 +975,25 @@ class PlannerSuite extends SharedSparkSession with AdaptiveSparkPlanHelper {
       }
     }
   }
+
+  test("aliases in the sort aggregate expressions should not introduce extra sort") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
+      withSQLConf(SQLConf.USE_OBJECT_HASH_AGG.key -> "false") {
+        val t1 = spark.range(10).selectExpr("floor(id/4) as k1")
+        val t2 = spark.range(20).selectExpr("floor(id/4) as k2")
+
+        val agg1 = t1.groupBy("k1").agg(collect_list("k1")).withColumnRenamed("k1", "k3")
+        val agg2 = t2.groupBy("k2").agg(collect_list("k2"))
+
+        val planned = agg1.join(agg2, $"k3" === $"k2").queryExecution.executedPlan
+        assert(planned.collect { case s: SortAggregateExec => s }.nonEmpty)
+
+        // We expect two SortExec nodes on each side of join.
+        val sorts = planned.collect { case s: SortExec => s }
+        assert(sorts.size == 4)
 
 Review comment:
   ah i see

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596935639
 
 
   **[Test build #119598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119598/testReport)** for PR 27842 at commit [`fc6bc62`](https://github.com/apache/spark/commit/fc6bc62356ef1a4bd23a69d5a73a5c7c78995c03).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389323444
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
 ##########
 @@ -604,6 +604,18 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils {
     }
   }
 
+  test("sort should not be introduced when aliases are used") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+      withTable("t") {
+        df1.repartition(1).write.format("parquet").bucketBy(8, "i").sortBy("i").saveAsTable("t")
 
 Review comment:
   Could you add more tests, e.g., orderBy cases?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596025309
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24227/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596048766
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119498/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596163156
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
maropu commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-597362994
 
 
   late LGTM, thanks, guys.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596935988
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119598/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596935988
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119598/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389985918
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputPartitioningAndOrdering.scala
 ##########
 @@ -16,16 +16,18 @@
  */
 package org.apache.spark.sql.execution
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression, SortOrder}
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning}
 
 /**
- * A trait that handles aliases in the `outputExpressions` to produce `outputPartitioning`
- * that satisfies output distribution requirements.
+ * A trait that handles aliases in the `outputExpressions` and `orderingExpressions` to produce
+ * `outputPartitioning` and `outputOrdering` that satisfy distribution and ordering requirements.
  */
-trait AliasAwareOutputPartitioning extends UnaryExecNode {
+trait AliasAwareOutputPartitioningAndOrdering extends UnaryExecNode {
   protected def outputExpressions: Seq[NamedExpression]
 
+  protected def orderingExpressions: Seq[SortOrder] = child.outputOrdering
 
 Review comment:
   Good catch. Now that you mention it, should I just separate this into `AliasAwareOutputPartitioning` and `AliasAwareOutputOrdering` and apply the latter only to `SortAggregateExec` and `ProjectExec` so that the intention is clear?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596163083
 
 
   **[Test build #119528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119528/testReport)** for PR 27842 at commit [`add187a`](https://github.com/apache/spark/commit/add187ae6443a8db71d24928c625794195604e0c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596025305
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389331709
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
 ##########
 @@ -604,6 +604,18 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils {
     }
   }
 
+  test("sort should not be introduced when aliases are used") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+      withTable("t") {
+        df1.repartition(1).write.format("parquet").bucketBy(8, "i").sortBy("i").saveAsTable("t")
 
 Review comment:
   Do you mean something like the following?
   ```
   val t1 = spark.range(10).selectExpr("id as k1").orderBy("k1")
     .selectExpr("k1 as k11").orderBy("k11")
   val t2 = spark.range(10).selectExpr("id as k2").orderBy("k2")
   t1.join(t2, t1("k11") === t2("k2")).explain(true)
   ```
   Extra `Sort` is optimized away, so it doesn't affect the physical plan (no extra sort) without this change.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389323467
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputs.scala
 ##########
 @@ -16,16 +16,18 @@
  */
 package org.apache.spark.sql.execution
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression, SortOrder}
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning}
 
 /**
- * A trait that handles aliases in the `outputExpressions` to produce `outputPartitioning`
- * that satisfies output distribution requirements.
+ * A trait that handles aliases in the `outputExpressions` and `orderingExpressions` to produce
+ * `outputPartitioning` and `outputOrdering` that satisfy distribution and ordering requirements.
  */
-trait AliasAwareOutputPartitioning extends UnaryExecNode {
+trait AliasAwareOutputs extends UnaryExecNode {
 
 Review comment:
   How about the name, `AliasAwareOutputPartitioningAndOrdering`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596025305
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596799305
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24318/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596935982
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596877211
 
 
   **[Test build #119598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119598/testReport)** for PR 27842 at commit [`fc6bc62`](https://github.com/apache/spark/commit/fc6bc62356ef1a4bd23a69d5a73a5c7c78995c03).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596177584
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119528/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-597048195
 
 
   **[Test build #119605 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119605/testReport)** for PR 27842 at commit [`fc6bc62`](https://github.com/apache/spark/commit/fc6bc62356ef1a4bd23a69d5a73a5c7c78995c03).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-597049054
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596177582
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596939995
 
 
   **[Test build #119605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119605/testReport)** for PR 27842 at commit [`fc6bc62`](https://github.com/apache/spark/commit/fc6bc62356ef1a4bd23a69d5a73a5c7c78995c03).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596868928
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596799294
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596048765
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596177419
 
 
   **[Test build #119528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119528/testReport)** for PR 27842 at commit [`add187a`](https://github.com/apache/spark/commit/add187ae6443a8db71d24928c625794195604e0c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r390032035
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
 ##########
 @@ -975,6 +975,25 @@ class PlannerSuite extends SharedSparkSession with AdaptiveSparkPlanHelper {
       }
     }
   }
+
+  test("aliases in the sort aggregate expressions should not introduce extra sort") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
+      withSQLConf(SQLConf.USE_OBJECT_HASH_AGG.key -> "false") {
+        val t1 = spark.range(10).selectExpr("floor(id/4) as k1")
+        val t2 = spark.range(20).selectExpr("floor(id/4) as k2")
+
+        val agg1 = t1.groupBy("k1").agg(collect_list("k1")).withColumnRenamed("k1", "k3")
+        val agg2 = t2.groupBy("k2").agg(collect_list("k2"))
+
+        val planned = agg1.join(agg2, $"k3" === $"k2").queryExecution.executedPlan
+        assert(planned.collect { case s: SortAggregateExec => s }.nonEmpty)
+
+        // We expect two SortExec nodes on each side of join.
+        val sorts = planned.collect { case s: SortExec => s }
+        assert(sorts.size == 4)
 
 Review comment:
   FYI, this is 5 without the fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389513418
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputPartitioningAndOrdering.scala
 ##########
 @@ -16,16 +16,18 @@
  */
 package org.apache.spark.sql.execution
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression, SortOrder}
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning}
 
 /**
- * A trait that handles aliases in the `outputExpressions` to produce `outputPartitioning`
- * that satisfies output distribution requirements.
+ * A trait that handles aliases in the `outputExpressions` and `orderingExpressions` to produce
+ * `outputPartitioning` and `outputOrdering` that satisfy distribution and ordering requirements.
  */
-trait AliasAwareOutputPartitioning extends UnaryExecNode {
+trait AliasAwareOutputPartitioningAndOrdering extends UnaryExecNode {
   protected def outputExpressions: Seq[NamedExpression]
 
+  protected def orderingExpressions: Seq[SortOrder] = child.outputOrdering
 
 Review comment:
   this implicitly indicates that the plan inherits output ordering from its child. This seems risky to me as `SparkPlan.outputOrdering` is `Nil` by default.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596025115
 
 
   **[Test build #119498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119498/testReport)** for PR 27842 at commit [`0dc6b08`](https://github.com/apache/spark/commit/0dc6b087e7fd7a9cd71120ef44740863cb27ddb4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389342226
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
 ##########
 @@ -604,6 +604,18 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils {
     }
   }
 
+  test("sort should not be introduced when aliases are used") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+      withTable("t") {
+        df1.repartition(1).write.format("parquet").bucketBy(8, "i").sortBy("i").saveAsTable("t")
 
 Review comment:
   Ah, I see. How about the aggregate case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596025115
 
 
   **[Test build #119498 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119498/testReport)** for PR 27842 at commit [`0dc6b08`](https://github.com/apache/spark/commit/0dc6b087e7fd7a9cd71120ef44740863cb27ddb4).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596798791
 
 
   **[Test build #119588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119588/testReport)** for PR 27842 at commit [`346a6d3`](https://github.com/apache/spark/commit/346a6d32a2225ca532cff5a3b7937e57168db8db).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596868928
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-597054964
 
 
   thanks, merging to master!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r389331709
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala
 ##########
 @@ -604,6 +604,18 @@ abstract class BucketedReadSuite extends QueryTest with SQLTestUtils {
     }
   }
 
+  test("sort should not be introduced when aliases are used") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "0") {
+      withTable("t") {
+        df1.repartition(1).write.format("parquet").bucketBy(8, "i").sortBy("i").saveAsTable("t")
 
 Review comment:
   Do you mean something like the following?
   ```
   val t1 = spark.range(10).selectExpr("id as k1").orderBy("k1")
     .selectExpr("k1 as k11").orderBy("k11")
   val t2 = spark.range(10).selectExpr("id as k2").orderBy("k2")
   t1.join(t2, t1("k11") === t2("k2")).explain(true)
   ```
   Extra `Sort` is optimized away, so it doesn't affect the physical plan (no extra sort).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596798791
 
 
   **[Test build #119588 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119588/testReport)** for PR 27842 at commit [`346a6d3`](https://github.com/apache/spark/commit/346a6d32a2225ca532cff5a3b7937e57168db8db).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596939995
 
 
   **[Test build #119605 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119605/testReport)** for PR 27842 at commit [`fc6bc62`](https://github.com/apache/spark/commit/fc6bc62356ef1a4bd23a69d5a73a5c7c78995c03).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596937681
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596938098
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596025309
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24227/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596868932
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119588/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596799305
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24318/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596048647
 
 
   **[Test build #119498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119498/testReport)** for PR 27842 at commit [`0dc6b08`](https://github.com/apache/spark/commit/0dc6b087e7fd7a9cd71120ef44740863cb27ddb4).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596048765
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596938107
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24334/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596938098
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596177584
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119528/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596877438
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-597049054
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596868501
 
 
   **[Test build #119588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119588/testReport)** for PR 27842 at commit [`346a6d3`](https://github.com/apache/spark/commit/346a6d32a2225ca532cff5a3b7937e57168db8db).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596163083
 
 
   **[Test build #119528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/119528/testReport)** for PR 27842 at commit [`add187a`](https://github.com/apache/spark/commit/add187ae6443a8db71d24928c625794195604e0c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r390024285
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/AliasAwareOutputPartitioningAndOrdering.scala
 ##########
 @@ -16,16 +16,18 @@
  */
 package org.apache.spark.sql.execution
 
-import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression}
+import org.apache.spark.sql.catalyst.expressions.{Alias, Attribute, AttributeReference, Expression, NamedExpression, SortOrder}
 import org.apache.spark.sql.catalyst.plans.physical.{HashPartitioning, Partitioning}
 
 /**
- * A trait that handles aliases in the `outputExpressions` to produce `outputPartitioning`
- * that satisfies output distribution requirements.
+ * A trait that handles aliases in the `outputExpressions` and `orderingExpressions` to produce
+ * `outputPartitioning` and `outputOrdering` that satisfy distribution and ordering requirements.
  */
-trait AliasAwareOutputPartitioning extends UnaryExecNode {
+trait AliasAwareOutputPartitioningAndOrdering extends UnaryExecNode {
   protected def outputExpressions: Seq[NamedExpression]
 
+  protected def orderingExpressions: Seq[SortOrder] = child.outputOrdering
 
 Review comment:
   Ah, nice catch... Yea, could you try to split it into the two parts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596877442
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24328/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
imback82 commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r390103406
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
 ##########
 @@ -975,6 +975,25 @@ class PlannerSuite extends SharedSparkSession with AdaptiveSparkPlanHelper {
       }
     }
   }
+
+  test("aliases in the sort aggregate expressions should not introduce extra sort") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
+      withSQLConf(SQLConf.USE_OBJECT_HASH_AGG.key -> "false") {
+        val t1 = spark.range(10).selectExpr("floor(id/4) as k1")
+        val t2 = spark.range(20).selectExpr("floor(id/4) as k2")
+
+        val agg1 = t1.groupBy("k1").agg(collect_list("k1")).withColumnRenamed("k1", "k3")
+        val agg2 = t2.groupBy("k2").agg(collect_list("k2"))
+
+        val planned = agg1.join(agg2, $"k3" === $"k2").queryExecution.executedPlan
+        assert(planned.collect { case s: SortAggregateExec => s }.nonEmpty)
+
+        // We expect two SortExec nodes on each side of join.
+        val sorts = planned.collect { case s: SortExec => s }
+        assert(sorts.size == 4)
 
 Review comment:
   Oh, nothing between them. Two sorts under `SortAggregateExec`:
   ```
      +- SortAggregate(key=[k2#224L], functions=[collect_list(k2#224L, 0, 0)], output=[k2#224L, collect_list(k2)#236])
         +- *(5) Sort [k2#224L ASC NULLS FIRST], false, 0
            +- Exchange hashpartitioning(k2#224L, 5), true, [id=#151]
               +- SortAggregate(key=[k2#224L], functions=[partial_collect_list(k2#224L, 0, 0)], output=[k2#224L, buf#254])
                  +- *(4) Sort [k2#224L ASC NULLS FIRST], false, 0
                     +- *(4) Project [FLOOR((cast(id#222L as double) / 4.0)) AS k2#224L]
                        +- *(4) Range (0, 20, step=1, splits=2)
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-597049062
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119605/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596177582
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596868932
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119588/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596877442
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24328/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596163158
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24259/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596048766
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/119498/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596163158
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/24259/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596163156
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#issuecomment-596935982
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #27842: [SPARK-31078][SQL] Respect aliases in output ordering
URL: https://github.com/apache/spark/pull/27842#discussion_r390101398
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/PlannerSuite.scala
 ##########
 @@ -975,6 +975,25 @@ class PlannerSuite extends SharedSparkSession with AdaptiveSparkPlanHelper {
       }
     }
   }
+
+  test("aliases in the sort aggregate expressions should not introduce extra sort") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {
+      withSQLConf(SQLConf.USE_OBJECT_HASH_AGG.key -> "false") {
+        val t1 = spark.range(10).selectExpr("floor(id/4) as k1")
+        val t2 = spark.range(20).selectExpr("floor(id/4) as k2")
+
+        val agg1 = t1.groupBy("k1").agg(collect_list("k1")).withColumnRenamed("k1", "k3")
+        val agg2 = t2.groupBy("k2").agg(collect_list("k2"))
+
+        val planned = agg1.join(agg2, $"k3" === $"k2").queryExecution.executedPlan
+        assert(planned.collect { case s: SortAggregateExec => s }.nonEmpty)
+
+        // We expect two SortExec nodes on each side of join.
+        val sorts = planned.collect { case s: SortExec => s }
+        assert(sorts.size == 4)
 
 Review comment:
   why do we have sorts between `SortAggregateExec` and `SortMergeJoinExec`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org