You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/06/22 19:12:48 UTC

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #569: Use repartition in window functions to speed up

Dandandan commented on a change in pull request #569:
URL: https://github.com/apache/arrow-datafusion/pull/569#discussion_r656508927



##########
File path: datafusion/src/physical_plan/windows.rs
##########
@@ -412,11 +412,14 @@ impl ExecutionPlan for WindowAggExec {
 
     /// Get the output partitioning of this plan
     fn output_partitioning(&self) -> Partitioning {
-        Partitioning::UnknownPartitioning(1)
+        // because we can have repartitioning using the partition keys
+        // this would be either 1 or more than 1 depending on the presense of
+        // repartitioning
+        self.input.output_partitioning()
     }
 
     fn required_child_distribution(&self) -> Distribution {
-        Distribution::SinglePartition
+        Distribution::UnspecifiedDistribution

Review comment:
       Would this be correct with a window without any `partition by` clause?
   
   In that case I think the required partitions should be 1, as the aggregate function can not be computed only over a part of the data.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org