You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/10/19 18:30:42 UTC

[GitHub] [spark] allisonwang-db opened a new pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

allisonwang-db opened a new pull request #30093:
URL: https://github.com/apache/spark/pull/30093


   ### What changes were proposed in this pull request?
   This PR aims to fix a correctness bug in the optimizer rule `EliminateSorts`. A global sort should not be eliminated even if its child is ordered since we don't know if its child ordering is global or local. For example, in the following scenario, the first sort shouldn't be removed because it has a stronger guarantee than the second sort even if the sort orders are the same for both sorts. 
   
   ```
   Sort(orders, global = True, ...)
     Sort(orders, global = False, ...)
   ```
   
   Since there is no straightforward way to identify whether a node's output ordering is local or global, a global sort node should only be eliminated when its child is ordered and is 1) another global sort or 2) a range operator.
   
   ### Why are the changes needed?
   To fix a bug in rule `EliminateSorts`.
   
   ### Does this PR introduce _any_ user-facing change?
   Yes
   
   ### How was this patch tested?
   Unit tests
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508280122



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##########
@@ -97,14 +97,22 @@ class EliminateSortsSuite extends PlanTest {
     comparePlans(optimized, correctAnswer)
   }
 
-  test("remove redundant order by") {
+  test("remove redundant local sort") {
     val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc, 'b.desc_nullsFirst)
-    val unnecessaryReordered = orderedPlan.limit(2).select('a).orderBy('a.asc, 'b.desc_nullsFirst)
+    val unnecessaryReordered = orderedPlan.limit(2).select('a).sortBy('a.asc, 'b.desc_nullsFirst)
     val optimized = Optimize.execute(unnecessaryReordered.analyze)
     val correctAnswer = orderedPlan.limit(2).select('a).analyze
     comparePlans(Optimize.execute(optimized), correctAnswer)
   }
 
+  test("should not remove global sort") {

Review comment:
       nit: Add a prefix: `SPARK-33183:` for all the added tests?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713241141






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715641025


   **[Test build #130219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130219/testReport)** for PR 30093 at commit [`2e5bc2c`](https://github.com/apache/spark/commit/2e5bc2cf3400b2608cb7a7c2c6c9a7254dbabc0c).
    * This patch **fails Scala style tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716870239






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715651326


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34819/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714703781






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713179432






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717413729


   **[Test build #130335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130335/testReport)** for PR 30093 at commit [`cdc7dbe`](https://github.com/apache/spark/commit/cdc7dbe1f69d2cec912c536d388eb68756107d61).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r511891971



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantSorts.scala
##########
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.expressions.SortOrder
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Remove redundant SortExec node from the spark plan. A sort node is redundant when
+ * its child satisfies both its sort orders and its required child distribution.

Review comment:
       Could you leave some comments about what's a difference form `EliminateSorts` here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713235557






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715647873


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34819/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713151337


   **[Test build #130057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130057/testReport)** for PR 30093 at commit [`2966802`](https://github.com/apache/spark/commit/29668029aefe5231104bb5afe0073abd70169b2c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715641031


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r511891374



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1242,6 +1242,13 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val REMOVE_REDUNDANT_SORTS_ENABLED = buildConf("spark.sql.execution.removeRedundantSorts")

Review comment:
       We need this config? cc: @cloud-fan 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r509898482



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1052,11 +1052,11 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
  *    function is order irrelevant
  */
 object EliminateSorts extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {

Review comment:
       Yes. This case is tricky. Let's consider this example: `df.sortBy('a.asc).sortBy('a.asc).sortBy('a.asc)`
   ```
   Sort(a, false, _)  // A
     Sort(a, false, _)  // B
       Sort(a, false, _)  // C
         ...
   ```
   If we don't use `transformUp`here, the rule will keep both Sort B and C since it will not apply the rule to A's child (B) again. Plan after running this rule will be:
   ```
   Sort(a, false, _) // B
     Sort(a, false, _) // C
        ...
   ```
   And the `Once` strategy's idempotency will be broken since Sort B will be removed if we run this rule again.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716784916






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714703781






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714797244






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712494785






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712403253






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713171387


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34666/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715640614


   **[Test build #130219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130219/testReport)** for PR 30093 at commit [`2e5bc2c`](https://github.com/apache/spark/commit/2e5bc2cf3400b2608cb7a7c2c6c9a7254dbabc0c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712363669


   **[Test build #130011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130011/testReport)** for PR 30093 at commit [`bf6e45a`](https://github.com/apache/spark/commit/bf6e45a570dfd1eb83e0c8531d9e2a9c571b82ef).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713151337


   **[Test build #130057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130057/testReport)** for PR 30093 at commit [`2966802`](https://github.com/apache/spark/commit/29668029aefe5231104bb5afe0073abd70169b2c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713142946


   **[Test build #130056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130056/testReport)** for PR 30093 at commit [`6d08848`](https://github.com/apache/spark/commit/6d088485bb68374bd9d4469e85238f22639c1dd6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712388551


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34618/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r512162304



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1242,6 +1242,13 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val REMOVE_REDUNDANT_SORTS_ENABLED = buildConf("spark.sql.execution.removeRedundantSorts")
+    .internal()
+    .doc("Whether to remove redundant physical sort node")
+    .version("3.1.0")

Review comment:
       I am not exactly sure if it's better to change it in this PR or to change it when this PR is backported to 2.4.8.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717724972


   I'm okay to backport it with new rule as it has config.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716748700


   **[Test build #130296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)** for PR 30093 at commit [`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717577247






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508901147



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##########
@@ -97,12 +97,27 @@ class EliminateSortsSuite extends PlanTest {
     comparePlans(optimized, correctAnswer)
   }
 
-  test("remove redundant order by") {
+  test("SPARK-33183: remove redundant sortBy") {
     val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc, 'b.desc_nullsFirst)
-    val unnecessaryReordered = orderedPlan.limit(2).select('a).orderBy('a.asc, 'b.desc_nullsFirst)

Review comment:
       hm, but this update can cause performance regression? (Just an idea) we cannot add a new physical rewrite rule in the preparation phase for this case?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716870239






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717727764


   @allisonwang-db can you help to create the backport PR? we need to change the config "since version" accordingly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715641035


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130219/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r509892889



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##########
@@ -97,12 +97,27 @@ class EliminateSortsSuite extends PlanTest {
     comparePlans(optimized, correctAnswer)
   }
 
-  test("remove redundant order by") {
+  test("SPARK-33183: remove redundant sortBy") {
     val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc, 'b.desc_nullsFirst)
-    val unnecessaryReordered = orderedPlan.limit(2).select('a).orderBy('a.asc, 'b.desc_nullsFirst)

Review comment:
       Good idea. It should be straightforward to add.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508902492



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1052,11 +1052,11 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
  *    function is order irrelevant
  */
 object EliminateSorts extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {

Review comment:
       I remember this comment: https://github.com/apache/spark/pull/21072#discussion_r183392394




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713173560


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34665/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717576204


   **[Test build #130335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130335/testReport)** for PR 30093 at commit [`cdc7dbe`](https://github.com/apache/spark/commit/cdc7dbe1f69d2cec912c536d388eb68756107d61).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717413729


   **[Test build #130335 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130335/testReport)** for PR 30093 at commit [`cdc7dbe`](https://github.com/apache/spark/commit/cdc7dbe1f69d2cec912c536d388eb68756107d61).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508159260



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1056,8 +1058,14 @@ object EliminateSorts extends Rule[LogicalPlan] {
     case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) =>
       val newOrders = orders.filterNot(_.child.foldable)
       if (newOrders.isEmpty) child else s.copy(order = newOrders)
-    case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>

Review comment:
       This was added in 2.4. 
   
   cc @cloud-fan 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r511899972



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##########
@@ -97,12 +97,34 @@ class EliminateSortsSuite extends PlanTest {
     comparePlans(optimized, correctAnswer)
   }
 
-  test("remove redundant order by") {

Review comment:
       We need to modify the existing test `test("remove redundant order by") {`? If the behaviour of the test does not change, I think it would be better to avoid updating it where possible.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r513102275



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantSorts.scala
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.expressions.SortOrder
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Remove redundant SortExec node from the spark plan. A sort node is redundant when
+ * its child satisfies both its sort orders and its required child distribution. Note
+ * this rule differs from the Optimizer rule EliminateSorts in that this rule also checks
+ * if the child satisfies the required distribution so that it is safe to remove not only a
+ * local sort but also a global sort when its child already satisfies required sort orders.
+ */
+case class RemoveRedundantSorts(conf: SQLConf) extends Rule[SparkPlan] {

Review comment:
       Yes, however, this is inevitable for such bug fix and correctness bug fix precedes performance. Since the physical rule is well separated and has a config, it looks okay.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715640614


   **[Test build #130219 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130219/testReport)** for PR 30093 at commit [`2e5bc2c`](https://github.com/apache/spark/commit/2e5bc2cf3400b2608cb7a7c2c6c9a7254dbabc0c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714664000


   **[Test build #130172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130172/testReport)** for PR 30093 at commit [`bcbd9fa`](https://github.com/apache/spark/commit/bcbd9faa84efe030b398e3e4c3553684d4ef70f7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r510658277



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantSortsSuite.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.{DataFrame, QueryTest}
+import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanHelper, DisableAdaptiveExecutionSuite, EnableAdaptiveExecutionSuite}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+
+
+abstract class RemoveRedundantSortsSuiteBase
+    extends QueryTest
+    with SharedSparkSession
+    with AdaptiveSparkPlanHelper {
+  import testImplicits._
+
+  private def checkNumSorts(df: DataFrame, count: Int): Unit = {
+    val plan = df.queryExecution.executedPlan
+    assert(collectWithSubqueries(plan) { case s: SortExec => s }.length == count)
+  }
+
+  private def checkSorts(query: String, enabledCount: Int, disabledCount: Int): Unit = {
+    withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "true") {
+      val df = sql(query)
+      checkNumSorts(df, enabledCount)
+      val result = df.collect()
+      withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "false") {
+        val df = sql(query)
+        checkNumSorts(df, disabledCount)
+        checkAnswer(df, result)
+      }
+    }
+  }
+
+  test("remove redundant sorts with limit") {
+    withTempView("t") {
+      spark.range(100).select('id as "key").createOrReplaceTempView("t")
+      val query =
+        """
+          |SELECT key FROM
+          | (SELECT key FROM t WHERE key > 10 ORDER BY key DESC LIMIT 10)
+          |ORDER BY key DESC
+          |""".stripMargin
+      checkSorts(query, 0, 1)
+    }
+  }
+
+  test("remove redundant sorts with broadcast hash join") {
+    withTempView("t1", "t2") {
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t1")
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t2")
+      val queryTemplate = """
+        |SELECT t1.key FROM
+        | (SELECT key FROM t1 WHERE key > 10 ORDER BY key DESC LIMIT 10) t1
+        |%s
+        | (SELECT key FROM t2 WHERE key > 50 ORDER BY key DESC LIMIT 100) t2
+        |ON t1.key = t2.key
+        |ORDER BY %s
+      """.stripMargin
+
+      val innerJoinAsc = queryTemplate.format("JOIN", "t2.key ASC")
+      checkSorts(innerJoinAsc, 1, 1)
+
+      val innerJoinDesc = queryTemplate.format("JOIN", "t2.key DESC")
+      checkSorts(innerJoinDesc, 0, 1)
+
+      val innerJoinDesc1 = queryTemplate.format("JOIN", "t1.key DESC")
+      checkSorts(innerJoinDesc1, 1, 1)
+
+      val leftOuterJoinDesc = queryTemplate.format("LEFT JOIN", "t1.key DESC")
+      checkSorts(leftOuterJoinDesc, 0, 1)
+    }
+  }
+
+  test("remove redundant sorts with sort merge join") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {

Review comment:
       can we use join hint to specify join type? it's more robust than tuning configs, especially when testing broadcast join.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713163872


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34665/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713173583






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r511893077



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantSortsSuite.scala
##########
@@ -0,0 +1,135 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.{DataFrame, QueryTest}
+import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanHelper, DisableAdaptiveExecutionSuite, EnableAdaptiveExecutionSuite}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+
+
+abstract class RemoveRedundantSortsSuiteBase
+    extends QueryTest
+    with SharedSparkSession
+    with AdaptiveSparkPlanHelper {
+  import testImplicits._
+
+  private def checkNumSorts(df: DataFrame, count: Int): Unit = {
+    val plan = df.queryExecution.executedPlan
+    assert(collectWithSubqueries(plan) { case s: SortExec => s }.length == count)
+  }
+
+  private def checkSorts(query: String, enabledCount: Int, disabledCount: Int): Unit = {
+    withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "true") {
+      val df = sql(query)
+      checkNumSorts(df, enabledCount)
+      val result = df.collect()
+      withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "false") {
+        val df = sql(query)
+        checkNumSorts(df, disabledCount)
+        checkAnswer(df, result)
+      }
+    }
+  }
+
+  test("remove redundant sorts with limit") {
+    withTempView("t") {
+      spark.range(100).select('id as "key").createOrReplaceTempView("t")
+      val query =
+        """
+          |SELECT key FROM
+          | (SELECT key FROM t WHERE key > 10 ORDER BY key DESC LIMIT 10)
+          |ORDER BY key DESC
+          |""".stripMargin
+      checkSorts(query, 0, 1)
+    }
+  }
+
+  test("remove redundant sorts with broadcast hash join") {
+    withTempView("t1", "t2") {
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t1")
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t2")
+      
+      val queryTemplate = """
+        |SELECT /*+ BROADCAST(%s) */ t1.key FROM
+        | (SELECT key FROM t1 WHERE key > 10 ORDER BY key DESC LIMIT 10) t1
+        |JOIN
+        | (SELECT key FROM t2 WHERE key > 50 ORDER BY key DESC LIMIT 100) t2
+        |ON t1.key = t2.key
+        |ORDER BY %s
+      """.stripMargin
+
+      val innerJoinAsc = queryTemplate.format("t1", "t2.key ASC")
+      checkSorts(innerJoinAsc, 1, 1)
+
+      val innerJoinDesc = queryTemplate.format("t1", "t2.key DESC")
+      checkSorts(innerJoinDesc, 0, 1)
+
+      val innerJoinDesc1 = queryTemplate.format("t1", "t1.key DESC")
+      checkSorts(innerJoinDesc1, 1, 1)

Review comment:
       nit: `innerJoinDesc` -> `innerJoinDesc1` and `innerJoinDesc1` -> `innerJoinDesc2`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717464978


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34937/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r512161304



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##########
@@ -97,12 +97,34 @@ class EliminateSortsSuite extends PlanTest {
     comparePlans(optimized, correctAnswer)
   }
 
-  test("remove redundant order by") {

Review comment:
       The test was actually changed from `orderBy` (global) to `sortBy` (local) so the test title was updated to "remove redundant sort by"




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r510383756



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1052,11 +1052,11 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
  *    function is order irrelevant
  */
 object EliminateSorts extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {

Review comment:
       Would something like this work?
   ```diff
   diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   index e41e380539..e17013f7e0 100644
   --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   @@ -1054,13 +1054,17 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
    object EliminateSorts extends Rule[LogicalPlan] {
      // transformUp is needed here to ensure idempotency of this rule when removing consecutive
      // local sorts.
   -  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
   +  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
        case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) =>
          val newOrders = orders.filterNot(_.child.foldable)
          if (newOrders.isEmpty) child else s.copy(order = newOrders)
   -    case Sort(orders, false, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   -      child
   -    case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child))
   +    case s @ Sort(orders, _, child) =>
   +      val sortsRemoved = recursiveRemoveSort(child)
   +      if (!s.global && SortOrder.orderingSatisfies(sortsRemoved.outputOrdering, orders)) {
   +        sortsRemoved
   +      } else {
   +        s.copy(child = sortsRemoved)
   +      }
        case j @ Join(originLeft, originRight, _, cond, _) if cond.forall(_.deterministic) =>
          j.copy(left = recursiveRemoveSort(originLeft), right = recursiveRemoveSort(originRight))
        case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) =>
   ```
   
   Not as elegant, but should avoid the extra iterations from `transformUp`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
tanelk commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r510383756



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1052,11 +1052,11 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
  *    function is order irrelevant
  */
 object EliminateSorts extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {

Review comment:
       Would something like this work?
   ```diff
   diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   index e41e380539..e17013f7e0 100644
   --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   @@ -1054,13 +1054,17 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
    object EliminateSorts extends Rule[LogicalPlan] {
      // transformUp is needed here to ensure idempotency of this rule when removing consecutive
      // local sorts.
   -  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {
   +  def apply(plan: LogicalPlan): LogicalPlan = plan transformDown {
        case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) =>
          val newOrders = orders.filterNot(_.child.foldable)
          if (newOrders.isEmpty) child else s.copy(order = newOrders)
   -    case Sort(orders, false, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   -      child
   -    case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child))
   +    case s @ Sort(orders, _, child) =>
   +      val sortsRemoved = recursiveRemoveSort(child)
   +      if (!s.global && SortOrder.orderingSatisfies(sortsRemoved.outputOrdering, orders)) {
   +        sortsRemoved
   +      } else {
   +        s.copy(child = sortsRemoved)
   +      }
        case j @ Join(originLeft, originRight, _, cond, _) if cond.forall(_.deterministic) =>
          j.copy(left = recursiveRemoveSort(originLeft), right = recursiveRemoveSort(originRight))
        case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) =>
   ```
   
   Not as elegant, but should avoid the extra iterations from `transformUp`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713142946


   **[Test build #130056 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130056/testReport)** for PR 30093 at commit [`6d08848`](https://github.com/apache/spark/commit/6d088485bb68374bd9d4469e85238f22639c1dd6).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713147885


   I like the approach of using two simpler rules and removing the special case for the `Range` operator.
   
   @tanelk  I am not sure if I fully understand your proposed logic behind "remove local child sorts inside a local sort". Say we have this case
   ```
   Sort('a.asc, false, _)
     Repartition
        Sort('b.asc, false, _)
   ```
   With the proposed logic, the initial `global` value will be false, which will lead to the `Repartition` case to return `false`, and the `recursiveRemoveSort` will fail to eliminate the child local sort `Sort('b.asc, false, _)`
   
   Can you provide an example when your logic can eliminate local child sorts inside a local sort while the current logic cannot?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713240522


   **[Test build #130057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130057/testReport)** for PR 30093 at commit [`2966802`](https://github.com/apache/spark/commit/29668029aefe5231104bb5afe0073abd70169b2c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713222032


   > With the proposed logic, the initial `global` value will be false, which will lead to the `Repartition` case to return `false`, and the `recursiveRemoveSort` will fail to eliminate the child local sort `Sort('b.asc, false, _)`
   > 
   > Can you provide an example when your logic can eliminate local child sorts inside a local sort while the current logic cannot?
   
   Ah yes, It seems we wouln't have to change the `canEliminateSort` at all.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714797244






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713235557






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714664000


   **[Test build #130172 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130172/testReport)** for PR 30093 at commit [`bcbd9fa`](https://github.com/apache/spark/commit/bcbd9faa84efe030b398e3e4c3553684d4ef70f7).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r511928699



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1242,6 +1242,13 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val REMOVE_REDUNDANT_SORTS_ENABLED = buildConf("spark.sql.execution.removeRedundantSorts")

Review comment:
       If we want to backport this, it's safer to have a config for the new rule, in case it has bugs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714703760


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34778/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715651332






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712403231


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34618/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712494120


   **[Test build #130011 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130011/testReport)** for PR 30093 at commit [`bf6e45a`](https://github.com/apache/spark/commit/bf6e45a570dfd1eb83e0c8531d9e2a9c571b82ef).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717444165


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34937/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r513086916



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantSorts.scala
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.expressions.SortOrder
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Remove redundant SortExec node from the spark plan. A sort node is redundant when
+ * its child satisfies both its sort orders and its required child distribution. Note
+ * this rule differs from the Optimizer rule EliminateSorts in that this rule also checks
+ * if the child satisfies the required distribution so that it is safe to remove not only a
+ * local sort but also a global sort when its child already satisfies required sort orders.
+ */
+case class RemoveRedundantSorts(conf: SQLConf) extends Rule[SparkPlan] {

Review comment:
       > We should only backport the fix of EliminateSorts to 2.4.
   
   But, the partial backport can cause performance regressin in 2.4 ? (global sort -> global sort case) https://github.com/apache/spark/pull/30093#discussion_r508901147




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717465000






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717577247






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r511810245



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1052,12 +1052,18 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
  *    function is order irrelevant
  */
 object EliminateSorts extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform applyLocally
+
+  val applyLocally: PartialFunction[LogicalPlan, LogicalPlan] = {

Review comment:
       nit: `private val`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713234957


   **[Test build #130056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130056/testReport)** for PR 30093 at commit [`6d08848`](https://github.com/apache/spark/commit/6d088485bb68374bd9d4469e85238f22639c1dd6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716784902


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34897/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716784916






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715651332






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716748700


   **[Test build #130296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)** for PR 30093 at commit [`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714691464


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34778/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717892704


   +1 to backport this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713173583






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716770165


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34897/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717715392


   @viirya @maropu what do you think about backporting? Shall we backport without the new rule or with it?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712363669


   **[Test build #130011 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130011/testReport)** for PR 30093 at commit [`bf6e45a`](https://github.com/apache/spark/commit/bf6e45a570dfd1eb83e0c8531d9e2a9c571b82ef).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712425110


   cc @viirya and @sunchao 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717438206


   The current change is out of the PR title and description. Before merging this, can you update the title and description? Thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r512875644



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/RemoveRedundantSorts.scala
##########
@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.catalyst.expressions.SortOrder
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Remove redundant SortExec node from the spark plan. A sort node is redundant when
+ * its child satisfies both its sort orders and its required child distribution. Note
+ * this rule differs from the Optimizer rule EliminateSorts in that this rule also checks
+ * if the child satisfies the required distribution so that it is safe to remove not only a
+ * local sort but also a global sort when its child already satisfies required sort orders.
+ */
+case class RemoveRedundantSorts(conf: SQLConf) extends Rule[SparkPlan] {

Review comment:
       Actually I think we should not backport this to 2.4. This is a new rule and looks like an improvement.
   
   We should only backport the fix of `EliminateSorts` to 2.4.
   
   But as it is late and this PR is close to merge and we have a config for it. I'm okay if others think it is fine.
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713059186


   Shouldn't it be like this?
   
   ```diff
   diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   index c37dac90c5..79610b4ab2 100644
   --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   @@ -1058,15 +1058,9 @@ object EliminateSorts extends Rule[LogicalPlan] {
        case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) =>
          val newOrders = orders.filterNot(_.child.foldable)
          if (newOrders.isEmpty) child else s.copy(order = newOrders)
   -    case s @ Sort(orders, global, child)
   -        if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   -      (global, child) match {
   -        case (false, _) => child
   -        case (true, r: Range) => r
   -        case (true, s @ Sort(_, true, _)) => s
   -        case (true, _) => s.copy(child = recursiveRemoveSort(child))
   -      }
   -    case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child))
   +    case Sort(orders, false, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   +      child
   +    case s @ Sort(_, true, child) => s.copy(child = recursiveRemoveSort(child))
        case j @ Join(originLeft, originRight, _, cond, _) if cond.forall(_.deterministic) =>
          j.copy(left = recursiveRemoveSort(originLeft), right = recursiveRemoveSort(originRight))
        case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) =>
   ```
   
   Two simpler rules:
   * Remove local sort if child satisfies the required ordering
   * Keep global sorts, but remove all sorts from child
   
   This fails two tests:
   * Range test, but it is not correct because of what @viirya said
   * In the "remove two consecutive global sorts with same ordering" this keeps the parent sort, but in the test it keeps the child sort.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717465000


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508279401



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1056,8 +1058,14 @@ object EliminateSorts extends Rule[LogicalPlan] {
     case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) =>
       val newOrders = orders.filterNot(_.child.foldable)
       if (newOrders.isEmpty) child else s.copy(order = newOrders)
-    case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
-      child
+    case s @ Sort(orders, global, child)
+        if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
+      (global, child) match {
+        case (false, _) => child
+        case (true, r: Range) => r
+        case (true, s @ Sort(_, true, _)) => s
+        case (true, _) => s.copy(child = recursiveRemoveSort(child))

Review comment:
       Could you modify the code to handle this case in L1069? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713179432






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu edited a comment on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
maropu edited a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717892704


   +1 to backport this. (I think it'd be better to avoid the performance regression of existing user's queries as much as possible.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717465009


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34937/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712403253


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r509880688



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1052,11 +1052,11 @@ object CombineFilters extends Rule[LogicalPlan] with PredicateHelper {
  *    function is order irrelevant
  */
 object EliminateSorts extends Rule[LogicalPlan] {
-  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transformUp {

Review comment:
       @allisonwang-db does it have to be `transformUp`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-715641031






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #30093:
URL: https://github.com/apache/spark/pull/30093


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-714795962


   **[Test build #130172 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130172/testReport)** for PR 30093 at commit [`bcbd9fa`](https://github.com/apache/spark/commit/bcbd9faa84efe030b398e3e4c3553684d4ef70f7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class RemoveRedundantSorts(conf: SQLConf) extends Rule[SparkPlan] `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r510658277



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantSortsSuite.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.{DataFrame, QueryTest}
+import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanHelper, DisableAdaptiveExecutionSuite, EnableAdaptiveExecutionSuite}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+
+
+abstract class RemoveRedundantSortsSuiteBase
+    extends QueryTest
+    with SharedSparkSession
+    with AdaptiveSparkPlanHelper {
+  import testImplicits._
+
+  private def checkNumSorts(df: DataFrame, count: Int): Unit = {
+    val plan = df.queryExecution.executedPlan
+    assert(collectWithSubqueries(plan) { case s: SortExec => s }.length == count)
+  }
+
+  private def checkSorts(query: String, enabledCount: Int, disabledCount: Int): Unit = {
+    withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "true") {
+      val df = sql(query)
+      checkNumSorts(df, enabledCount)
+      val result = df.collect()
+      withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "false") {
+        val df = sql(query)
+        checkNumSorts(df, disabledCount)
+        checkAnswer(df, result)
+      }
+    }
+  }
+
+  test("remove redundant sorts with limit") {
+    withTempView("t") {
+      spark.range(100).select('id as "key").createOrReplaceTempView("t")
+      val query =
+        """
+          |SELECT key FROM
+          | (SELECT key FROM t WHERE key > 10 ORDER BY key DESC LIMIT 10)
+          |ORDER BY key DESC
+          |""".stripMargin
+      checkSorts(query, 0, 1)
+    }
+  }
+
+  test("remove redundant sorts with broadcast hash join") {
+    withTempView("t1", "t2") {
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t1")
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t2")
+      val queryTemplate = """
+        |SELECT t1.key FROM
+        | (SELECT key FROM t1 WHERE key > 10 ORDER BY key DESC LIMIT 10) t1
+        |%s
+        | (SELECT key FROM t2 WHERE key > 50 ORDER BY key DESC LIMIT 100) t2
+        |ON t1.key = t2.key
+        |ORDER BY %s
+      """.stripMargin
+
+      val innerJoinAsc = queryTemplate.format("JOIN", "t2.key ASC")
+      checkSorts(innerJoinAsc, 1, 1)
+
+      val innerJoinDesc = queryTemplate.format("JOIN", "t2.key DESC")
+      checkSorts(innerJoinDesc, 0, 1)
+
+      val innerJoinDesc1 = queryTemplate.format("JOIN", "t1.key DESC")
+      checkSorts(innerJoinDesc1, 1, 1)
+
+      val leftOuterJoinDesc = queryTemplate.format("LEFT JOIN", "t1.key DESC")
+      checkSorts(leftOuterJoinDesc, 0, 1)
+    }
+  }
+
+  test("remove redundant sorts with sort merge join") {
+    withSQLConf(SQLConf.AUTO_BROADCASTJOIN_THRESHOLD.key -> "-1") {

Review comment:
       can we use join hint to specify join type? it's more robust than tuning configs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tanelk commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
tanelk commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713073103


   Taking it a bit further we can also remove local child sorts inside a local sort:
   
   ```diff
   diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   index c37dac90c5..dee3e40f0d 100644
   --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
   @@ -1058,33 +1058,29 @@ object EliminateSorts extends Rule[LogicalPlan] {
        case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) =>
          val newOrders = orders.filterNot(_.child.foldable)
          if (newOrders.isEmpty) child else s.copy(order = newOrders)
   -    case s @ Sort(orders, global, child)
   -        if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   -      (global, child) match {
   -        case (false, _) => child
   -        case (true, r: Range) => r
   -        case (true, s @ Sort(_, true, _)) => s
   -        case (true, _) => s.copy(child = recursiveRemoveSort(child))
   -      }
   -    case s @ Sort(_, _, child) => s.copy(child = recursiveRemoveSort(child))
   +    case Sort(orders, false, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
   +      child
   +    case s @ Sort(_, global, child) => s.copy(child = recursiveRemoveSort(child, global))
        case j @ Join(originLeft, originRight, _, cond, _) if cond.forall(_.deterministic) =>
          j.copy(left = recursiveRemoveSort(originLeft), right = recursiveRemoveSort(originRight))
        case g @ Aggregate(_, aggs, originChild) if isOrderIrrelevantAggs(aggs) =>
          g.copy(child = recursiveRemoveSort(originChild))
      }
   
   -  private def recursiveRemoveSort(plan: LogicalPlan): LogicalPlan = plan match {
   -    case Sort(_, _, child) => recursiveRemoveSort(child)
   -    case other if canEliminateSort(other) =>
   -      other.withNewChildren(other.children.map(recursiveRemoveSort))
   +  private def recursiveRemoveSort(
   +    plan: LogicalPlan, global: Boolean = true): LogicalPlan = plan match {
   +    case Sort(_, false, child) if !global => recursiveRemoveSort(child, global)
   +    case Sort(_, _, child) if global => recursiveRemoveSort(child, global)
   +    case other if canEliminateSort(other, global) =>
   +      other.withNewChildren(other.children.map(recursiveRemoveSort(_, global)))
        case _ => plan
      }
   
   -  private def canEliminateSort(plan: LogicalPlan): Boolean = plan match {
   +  private def canEliminateSort(plan: LogicalPlan, global: Boolean): Boolean = plan match {
        case p: Project => p.projectList.forall(_.deterministic)
        case f: Filter => f.condition.deterministic
   -    case r: RepartitionByExpression => r.partitionExpressions.forall(_.deterministic)
   -    case _: Repartition => true
   +    case r: RepartitionByExpression => r.partitionExpressions.forall(_.deterministic) && global
   +    case _: Repartition => global
        case _ => false
      }
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r512162304



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1242,6 +1242,13 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val REMOVE_REDUNDANT_SORTS_ENABLED = buildConf("spark.sql.execution.removeRedundantSorts")
+    .internal()
+    .doc("Whether to remove redundant physical sort node")
+    .version("3.1.0")

Review comment:
       I am not exactly sure if it's better to change it in this PR or to change it when this PR is backported to 2.4.8 (in case the current change does not work in 2.4.8)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-717715051


   thanks, merging to master!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] allisonwang-db commented on pull request #30093: [SPARK-33183][SQL] Fix Optimizer rule EliminateSorts and add a physical rule to remove redundant sorts

Posted by GitBox <gi...@apache.org>.
allisonwang-db commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-719139123


   I will update the master branch config version to 2.4.8 once the backport PRs are merged.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712494785






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713241141






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-716869601


   **[Test build #130296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130296/testReport)** for PR 30093 at commit [`59f9cd4`](https://github.com/apache/spark/commit/59f9cd4be7f7c93f8cf01c66dfb5619af548e66b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r510659012



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/RemoveRedundantSortsSuite.scala
##########
@@ -0,0 +1,136 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import org.apache.spark.sql.{DataFrame, QueryTest}
+import org.apache.spark.sql.execution.adaptive.{AdaptiveSparkPlanHelper, DisableAdaptiveExecutionSuite, EnableAdaptiveExecutionSuite}
+import org.apache.spark.sql.internal.SQLConf
+import org.apache.spark.sql.test.SharedSparkSession
+
+
+abstract class RemoveRedundantSortsSuiteBase
+    extends QueryTest
+    with SharedSparkSession
+    with AdaptiveSparkPlanHelper {
+  import testImplicits._
+
+  private def checkNumSorts(df: DataFrame, count: Int): Unit = {
+    val plan = df.queryExecution.executedPlan
+    assert(collectWithSubqueries(plan) { case s: SortExec => s }.length == count)
+  }
+
+  private def checkSorts(query: String, enabledCount: Int, disabledCount: Int): Unit = {
+    withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "true") {
+      val df = sql(query)
+      checkNumSorts(df, enabledCount)
+      val result = df.collect()
+      withSQLConf(SQLConf.REMOVE_REDUNDANT_SORTS_ENABLED.key -> "false") {
+        val df = sql(query)
+        checkNumSorts(df, disabledCount)
+        checkAnswer(df, result)
+      }
+    }
+  }
+
+  test("remove redundant sorts with limit") {
+    withTempView("t") {
+      spark.range(100).select('id as "key").createOrReplaceTempView("t")
+      val query =
+        """
+          |SELECT key FROM
+          | (SELECT key FROM t WHERE key > 10 ORDER BY key DESC LIMIT 10)
+          |ORDER BY key DESC
+          |""".stripMargin
+      checkSorts(query, 0, 1)
+    }
+  }
+
+  test("remove redundant sorts with broadcast hash join") {
+    withTempView("t1", "t2") {
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t1")
+      spark.range(1000).select('id as "key").createOrReplaceTempView("t2")
+      val queryTemplate = """
+        |SELECT t1.key FROM
+        | (SELECT key FROM t1 WHERE key > 10 ORDER BY key DESC LIMIT 10) t1
+        |%s
+        | (SELECT key FROM t2 WHERE key > 50 ORDER BY key DESC LIMIT 100) t2
+        |ON t1.key = t2.key
+        |ORDER BY %s
+      """.stripMargin
+
+      val innerJoinAsc = queryTemplate.format("JOIN", "t2.key ASC")
+      checkSorts(innerJoinAsc, 1, 1)
+
+      val innerJoinDesc = queryTemplate.format("JOIN", "t2.key DESC")
+      checkSorts(innerJoinDesc, 0, 1)
+
+      val innerJoinDesc1 = queryTemplate.format("JOIN", "t1.key DESC")
+      checkSorts(innerJoinDesc1, 1, 1)
+
+      val leftOuterJoinDesc = queryTemplate.format("LEFT JOIN", "t1.key DESC")

Review comment:
       we don't need to use LEFT JOIN if we use join hint to specify which side to broadcast.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r511958568



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1242,6 +1242,13 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val REMOVE_REDUNDANT_SORTS_ENABLED = buildConf("spark.sql.execution.removeRedundantSorts")
+    .internal()
+    .doc("Whether to remove redundant physical sort node")
+    .version("3.1.0")

Review comment:
       If we backport this into branch-2.4, `3.1.0` -> `2.4.8`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-713179418


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34666/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30093:
URL: https://github.com/apache/spark/pull/30093#discussion_r508160798



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -1056,8 +1058,14 @@ object EliminateSorts extends Rule[LogicalPlan] {
     case s @ Sort(orders, _, child) if orders.isEmpty || orders.exists(_.child.foldable) =>
       val newOrders = orders.filterNot(_.child.foldable)
       if (newOrders.isEmpty) child else s.copy(order = newOrders)
-    case Sort(orders, true, child) if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
-      child
+    case s @ Sort(orders, global, child)
+        if SortOrder.orderingSatisfies(child.outputOrdering, orders) =>
+      (global, child) match {
+        case (false, _) => child
+        case (true, r: Range) => r

Review comment:
       This assumes we know Range's global ordering in advance. This seems to leak physical stuff into the Optimizer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30093: [SPARK-33183][SQL] Fix EliminateSorts bug when removing global sorts

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30093:
URL: https://github.com/apache/spark/pull/30093#issuecomment-712403264


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34618/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org