You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/15 23:32:35 UTC

[GitHub] [spark] HeartSaVioR commented on a change in pull request #35526: [SPARK-38214][SS]No need to filter data when the sliding window length is not redundant

HeartSaVioR commented on a change in pull request #35526:
URL: https://github.com/apache/spark/pull/35526#discussion_r807405322



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -3927,8 +3927,12 @@ object TimeWindowing extends Rule[LogicalPlan] {
           val projections = windows.map(_ +: child.output)
 
           val filterExpr =
-            window.timeColumn >= windowAttr.getField(WINDOW_START) &&
-              window.timeColumn < windowAttr.getField(WINDOW_END)
+            if (window.windowDuration % window.slideDuration == 0) {

Review comment:
       Probably good to leave code comment like below:
   
   > When the condition `windowDuration % slideDuration = 0` is fulfilled, the estimation of the number of windows becomes exact one, which means all produced windows are valid.
   
   I'm not a native so the sentence may not be perfect, but may be acceptable and understandable.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/DataFrameTimeWindowingSuite.scala
##########
@@ -490,4 +490,46 @@ class DataFrameTimeWindowingSuite extends QueryTest with SharedSparkSession {
       assert(attributeReference.dataType == tuple._2)
     }
   }
+
+  test("SPARK-38214: No need to filter data when the sliding window length is not redundant") {

Review comment:
       It would be nice if we are clear about what we want to test.
   
   If we want to verify that the change still produces right windows, I'd rather verify the boundary of time range explicitly, even it is much verbose.
   
   If we have such test in existing tests, and want to ensure we don't inject comparison expression on calculation of time window, we probably need to look into logical plan (especially Filter) and verify the expression used in Filter.
   
   If we have such test in existing tests and you feel uneasy to verify the expression in Filter node, it's OK to skip adding the test, since it means the functionality is validated with existing tests.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org