You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/10/16 15:07:16 UTC

[PR] [WIP][SPARK-45543][SQL] WindowGroupLimit causes bug if window expressions exists non-rank function. [spark]

beliefer opened a new pull request, #43385:
URL: https://github.com/apache/spark/pull/43385

   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   'Yes'.
   Fix the bug.
   
   
   ### How was this patch tested?
   New test cases.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   'No'.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43385:
URL: https://github.com/apache/spark/pull/43385#issuecomment-1767797615

   ping @cloud-fan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43385:
URL: https://github.com/apache/spark/pull/43385#discussion_r1363852786


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala:
##########
@@ -52,6 +52,16 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with PredicateHelper {
     if (limits.nonEmpty) Some(limits.min) else None
   }
 
+  /**
+   * Require other window functions to have the same window frame as the rank-like functions.

Review Comment:
   ```
   All window frames should be expanding windows, so that we can safely do early stop.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] `InferWindowGroupLimit` causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer closed pull request #43385: [SPARK-45543][SQL] `InferWindowGroupLimit` causes bug if the other window functions haven't the same window frame as the rank-like functions
URL: https://github.com/apache/spark/pull/43385


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43385:
URL: https://github.com/apache/spark/pull/43385#discussion_r1363848349


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala:
##########
@@ -52,6 +52,16 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with PredicateHelper {
     if (limits.nonEmpty) Some(limits.min) else None
   }
 
+  /**
+   * Require other window functions to have the same window frame as the rank-like functions.
+   */
+  private def satisfySpecifiedWindowFrame(

Review Comment:
   I think `isExpandingWindow` is a better name



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43385:
URL: https://github.com/apache/spark/pull/43385#discussion_r1363837920


##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1521,4 +1521,119 @@ class DataFrameWindowFunctionsSuite extends QueryTest
       assert(windows.size === 1)
     }
   }
+
+  test("SPARK-45543: WindowGroupLimit causes bug if window expressions exists non-rank function") {
+    val df = Seq(
+      (1, "Dave", 1, 2020),
+      (2, "Dave", 1, 2021),
+      (3, "Dave", 2, 2022),
+      (4, "Dave", 3, 2023),
+      (5, "Dave", 3, 2024),
+      (6, "Mark", 2, 2022),
+      (7, "Mark", 3, 2023),
+      (8, "Mark", 3, 2024),
+      (9, "Amy", 6, 2021),
+      (10, "Amy", 5, 2022),
+      (11, "Amy", 6, 2023),
+      (12, "Amy", 7, 2024),
+      (13, "John", 7, 2024)).toDF("id", "name", "score", "year")
+
+    val window = Window.partitionBy($"year").orderBy($"score".desc)
+    val window2 = window.rowsBetween(Window.unboundedPreceding, Window.currentRow)
+    val window3 = window.rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
+
+    Seq(true, false).foreach { enableEvaluator =>
+      withSQLConf(SQLConf.USE_PARTITION_EVALUATOR.key -> enableEvaluator.toString) {

Review Comment:
   no need to test it. just test with the default



##########
sql/core/src/test/scala/org/apache/spark/sql/DataFrameWindowFunctionsSuite.scala:
##########
@@ -1521,4 +1521,119 @@ class DataFrameWindowFunctionsSuite extends QueryTest
       assert(windows.size === 1)
     }
   }
+
+  test("SPARK-45543: WindowGroupLimit causes bug if window expressions exists non-rank function") {
+    val df = Seq(
+      (1, "Dave", 1, 2020),
+      (2, "Dave", 1, 2021),
+      (3, "Dave", 2, 2022),
+      (4, "Dave", 3, 2023),
+      (5, "Dave", 3, 2024),
+      (6, "Mark", 2, 2022),
+      (7, "Mark", 3, 2023),
+      (8, "Mark", 3, 2024),
+      (9, "Amy", 6, 2021),
+      (10, "Amy", 5, 2022),
+      (11, "Amy", 6, 2023),
+      (12, "Amy", 7, 2024),
+      (13, "John", 7, 2024)).toDF("id", "name", "score", "year")
+
+    val window = Window.partitionBy($"year").orderBy($"score".desc)
+    val window2 = window.rowsBetween(Window.unboundedPreceding, Window.currentRow)
+    val window3 = window.rowsBetween(Window.unboundedPreceding, Window.unboundedFollowing)
+
+    Seq(true, false).foreach { enableEvaluator =>
+      withSQLConf(SQLConf.USE_PARTITION_EVALUATOR.key -> enableEvaluator.toString) {

Review Comment:
   no need to test it twice. just test with the default



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43385:
URL: https://github.com/apache/spark/pull/43385#discussion_r1363851695


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala:
##########
@@ -52,6 +52,16 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with PredicateHelper {
     if (limits.nonEmpty) Some(limits.min) else None
   }
 
+  /**
+   * Require other window functions to have the same window frame as the rank-like functions.
+   */
+  private def satisfySpecifiedWindowFrame(
+      windowExpression: NamedExpression): Boolean = windowExpression match {
+    case Alias(WindowExpression(_, WindowSpecDefinition(_, _,
+    SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))), _) => true
+    case _ => false
+  }
+
   private def support(
       windowExpression: NamedExpression): Boolean = windowExpression match {

Review Comment:
   actually, this function can take window function as input now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] `InferWindowGroupLimit` causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43385:
URL: https://github.com/apache/spark/pull/43385#discussion_r1364796972


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala:
##########
@@ -52,23 +52,32 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with PredicateHelper {
     if (limits.nonEmpty) Some(limits.min) else None
   }
 
-  private def support(
+  /**
+   * All window expressions should use expanding window, so that we can safely do early stop.

Review Comment:
   ```suggestion
      * All window expressions should use the same expanding window, so that we can safely do the early stop.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] `InferWindowGroupLimit` causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43385:
URL: https://github.com/apache/spark/pull/43385#issuecomment-1770845622

   Merged to master/branch-3.5
   @cloud-fan Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "beliefer (via GitHub)" <gi...@apache.org>.
beliefer commented on PR #43385:
URL: https://github.com/apache/spark/pull/43385#issuecomment-1765765935

   The GA failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43385:
URL: https://github.com/apache/spark/pull/43385#discussion_r1363850211


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala:
##########
@@ -52,6 +52,16 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with PredicateHelper {
     if (limits.nonEmpty) Some(limits.min) else None
   }
 
+  /**
+   * Require other window functions to have the same window frame as the rank-like functions.
+   */
+  private def satisfySpecifiedWindowFrame(
+      windowExpression: NamedExpression): Boolean = windowExpression match {
+    case Alias(WindowExpression(_, WindowSpecDefinition(_, _,
+    SpecifiedWindowFrame(RowFrame, UnboundedPreceding, CurrentRow))), _) => true
+    case _ => false
+  }
+
   private def support(
       windowExpression: NamedExpression): Boolean = windowExpression match {

Review Comment:
   We can simplify it now
   ```
   if (isExpandingWindow(windowExpression)) {
     windowExpression.child.asInstanceOf[WindowExpression].windowFunction match {
       case _: Rank | _: DenseRank | _: RowNumber => true
       case _ => false
     }
   } else {
     false
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45543][SQL] WindowGroupLimit causes bug if the other window functions haven't the same window frame as the rank-like functions [spark]

Posted by "cloud-fan (via GitHub)" <gi...@apache.org>.
cloud-fan commented on code in PR #43385:
URL: https://github.com/apache/spark/pull/43385#discussion_r1363852786


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InferWindowGroupLimit.scala:
##########
@@ -52,6 +52,16 @@ object InferWindowGroupLimit extends Rule[LogicalPlan] with PredicateHelper {
     if (limits.nonEmpty) Some(limits.min) else None
   }
 
+  /**
+   * Require other window functions to have the same window frame as the rank-like functions.

Review Comment:
   ```
   All window expressions should use expanding window, so that we can safely do early stop.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org