You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/23 13:19:31 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #571: Collapse sort into window expr and do sort within logical phase

alamb commented on a change in pull request #571:
URL: https://github.com/apache/arrow-datafusion/pull/571#discussion_r657089158



##########
File path: datafusion/src/logical_plan/expr.rs
##########
@@ -1452,11 +1452,18 @@ impl fmt::Debug for Expr {
             }
             Expr::WindowFunction {
                 fun,
-                ref args,
+                args,
+                partition_by,

Review comment:
       It seems like the partition_by and order_by more logically belong on the `LogicalPlan::Window` node if they can be shared across `Expr::WindowFunction`
   

##########
File path: datafusion/src/physical_plan/planner.rs
##########
@@ -265,9 +266,48 @@ impl DefaultPhysicalPlanner {
                 }
 
                 let input_exec = self.create_initial_plan(input, ctx_state)?;
-                let physical_input_schema = input_exec.schema();
-                let logical_input_schema = input.as_ref().schema();
 
+                // at this moment we are guaranteed by the logical planner
+                // to have all the window_expr to have equal sort key

Review comment:
       Is the design that the planner will group all WindowExpr that have the same window (ORDER BY, PARTITION BY) into the same `LogicalPlan:Window` node? That sounds like a good plan to me 👍 
   
   It might (eventually) be worth validating that invariant here and `assert`ing (or returning an Error` if the sort  keys are not the same for multiple window exprs (mostly to catch potential bugs faster in the future)

##########
File path: datafusion/src/sql/planner.rs
##########
@@ -2884,11 +2881,9 @@ mod tests {
         let sql = "SELECT order_id, MAX(qty) OVER (ORDER BY order_id), MIN(qty) OVER (ORDER BY order_id DESC) from orders";
         let expected = "\
         Projection: #orders.order_id, #MAX(orders.qty), #MIN(orders.qty)\
-        \n  WindowAggr: windowExpr=[[MAX(#orders.qty)]]\
-        \n    Sort: #orders.order_id ASC NULLS FIRST\
-        \n      WindowAggr: windowExpr=[[MIN(#orders.qty)]]\
-        \n        Sort: #orders.order_id DESC NULLS FIRST\
-        \n          TableScan: orders projection=None";
+        \n  WindowAggr: windowExpr=[[MAX(#orders.qty) ORDER BY [#orders.order_id ASC NULLS FIRST]]]\

Review comment:
       Yeah this is cool to see two different WindowAggr nodes with different ORDER BY clauses. 👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org