You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/27 05:17:10 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

jorgecarleitao opened a new pull request #9333:
URL: https://github.com/apache/arrow/pull/9333


   This PR adds support for custom [logical] optimization rules to be injected to the `ExecutionContext`, thereby allowing people to create their own optimizers and run them through the logical plan during the optimization step.
   
   Another way of thinking about this is that `QueryPlanner` is now only responsible for converting a logical plan into an execution plan, while the `optimizerRule` is responsible for re-writing a logical plan into a logical plan:
   
   * `OptimizerRule`: `LogicalPlan -> LogicalPlan`
   * `QueryPlanner`: `LogicalPlan -> ExecutionPlan`
   
   The second commit on this PR is just a small simplification that helps people writing the rules without having to worry about the `Explain`. This is important because forgetting about it has major consequences to UX (explain does not work as intended)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9333:
URL: https://github.com/apache/arrow/pull/9333


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#issuecomment-768056067


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=h1) Report
   > Merging [#9333](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=desc) (6ae574b) into [master](https://codecov.io/gh/apache/arrow/commit/437c8c944acb3479b76804f041f5f8cbce309fa7?el=desc) (437c8c9) will **increase** coverage by `0.01%`.
   > The diff coverage is `97.82%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9333/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9333      +/-   ##
   ==========================================
   + Coverage   81.88%   81.90%   +0.01%     
   ==========================================
     Files         215      214       -1     
     Lines       52988    52981       -7     
   ==========================================
   + Hits        43391    43395       +4     
   + Misses       9597     9586      -11     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/tests/user\_defined\_plan.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3VzZXJfZGVmaW5lZF9wbGFuLnJz) | `80.76% <93.75%> (+1.47%)` | :arrow_up: |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.76% <100.00%> (+0.51%)` | :arrow_up: |
   | [rust/datafusion/src/optimizer/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvZmlsdGVyX3B1c2hfZG93bi5ycw==) | `97.65% <100.00%> (ø)` | |
   | [...datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvaGFzaF9idWlsZF9wcm9iZV9vcmRlci5ycw==) | `54.73% <100.00%> (ø)` | |
   | [...t/datafusion/src/optimizer/projection\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvcHJvamVjdGlvbl9wdXNoX2Rvd24ucnM=) | `97.70% <100.00%> (ø)` | |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `51.85% <100.00%> (+2.83%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=footer). Last update [437c8c9...6ae574b](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#discussion_r565654606



##########
File path: rust/datafusion/tests/user_defined_plan.rs
##########
@@ -209,51 +209,31 @@ struct TopKOptimizerRule {}
 impl OptimizerRule for TopKOptimizerRule {
     // Example rewrite pass to insert a user defined LogicalPlanNode
     fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
-        match plan {
-            // Note: this code simply looks for the pattern of a Limit followed by a
-            // Sort and replaces it by a TopK node. It does not handle many
-            // edge cases (e.g multiple sort columns, sort ASC / DESC), etc.
-            LogicalPlan::Limit { ref n, ref input } => {
-                if let LogicalPlan::Sort {
-                    ref expr,
-                    ref input,
-                } = **input
-                {
-                    if expr.len() == 1 {
-                        // we found a sort with a single sort expr, replace with a a TopK
-                        return Ok(LogicalPlan::Extension {
-                            node: Arc::new(TopKPlanNode {
-                                k: *n,
-                                input: self.optimize(input.as_ref())?,
-                                expr: expr[0].clone(),
-                            }),
-                        });
-                    }
+        // Note: this code simply looks for the pattern of a Limit followed by a
+        // Sort and replaces it by a TopK node. It does not handle many
+        // edge cases (e.g multiple sort columns, sort ASC / DESC), etc.
+        if let LogicalPlan::Limit { ref n, ref input } = plan {

Review comment:
       👍 much nicer




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#issuecomment-768537628


   I definitely want to review this one but probably won't get to this until the weekend,


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#issuecomment-768037145


   https://issues.apache.org/jira/browse/ARROW-11395


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#discussion_r565652596



##########
File path: rust/datafusion/tests/user_defined_plan.rs
##########
@@ -178,7 +178,9 @@ async fn topk_plan() -> Result<()> {
 }
 
 fn make_topk_context() -> ExecutionContext {
-    let config = ExecutionConfig::new().with_query_planner(Arc::new(TopKQueryPlanner {}));
+    let config = ExecutionConfig::new()

Review comment:
       For other reviewers, here is a good example showing how the interface changes. Previously using custom optimizers meant you had to effectively implement a custom "query planner".
   
   I like it. 👍 

##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -324,19 +324,15 @@ impl ExecutionContext {
 
     /// Optimize the logical plan by applying optimizer rules
     pub fn optimize(&self, plan: &LogicalPlan) -> Result<LogicalPlan> {
-        // Apply standard rewrites and optimizations
+        let optimizers = &self.state.lock().unwrap().config.optimizers;
+
+        let mut new_plan = plan.clone();
         debug!("Logical plan:\n {:?}", plan);
-        let mut plan = ProjectionPushDown::new().optimize(&plan)?;
-        plan = FilterPushDown::new().optimize(&plan)?;
-        plan = HashBuildProbeOrder::new().optimize(&plan)?;
+        for optimizer in optimizers {
+            new_plan = optimizer.optimize(&new_plan)?;

Review comment:
       Eventually I would love to avoid this copying (aka this should ideally be `new_plan = optimizer.optimize(new_plan)` -- but we'll get there eventually :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#discussion_r567331656



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -529,6 +525,15 @@ impl ExecutionConfig {
         self.query_planner = query_planner;
         self
     }
+
+    /// Adds a new [`OptimizerRule`]
+    pub fn add_optimizer_rule(

Review comment:
       I don't have a use case in mind, but what if we needed to insert an optimizer rule before the standard ones run, or override one of the standard rules?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#discussion_r567369836



##########
File path: rust/datafusion/src/execution/context.rs
##########
@@ -529,6 +525,15 @@ impl ExecutionConfig {
         self.query_planner = query_planner;
         self
     }
+
+    /// Adds a new [`OptimizerRule`]
+    pub fn add_optimizer_rule(

Review comment:
       Had the same though too and also did not see an immediate use case. So here I just delayed the decision for that API to whenever/if someone has a need for it (`with_optimizer_rules` or something would work).
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9333: ARROW-11395: [DataFusion] Support custom optimizers

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9333:
URL: https://github.com/apache/arrow/pull/9333#issuecomment-768056067


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=h1) Report
   > Merging [#9333](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=desc) (62f89eb) into [master](https://codecov.io/gh/apache/arrow/commit/437c8c944acb3479b76804f041f5f8cbce309fa7?el=desc) (437c8c9) will **increase** coverage by `0.01%`.
   > The diff coverage is `100.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9333/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9333      +/-   ##
   ==========================================
   + Coverage   81.88%   81.90%   +0.01%     
   ==========================================
     Files         215      214       -1     
     Lines       52988    52983       -5     
   ==========================================
   + Hits        43391    43396       +5     
   + Misses       9597     9587      -10     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `88.76% <100.00%> (+0.51%)` | :arrow_up: |
   | [rust/datafusion/src/optimizer/filter\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvZmlsdGVyX3B1c2hfZG93bi5ycw==) | `97.65% <100.00%> (ø)` | |
   | [...datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvaGFzaF9idWlsZF9wcm9iZV9vcmRlci5ycw==) | `54.73% <100.00%> (ø)` | |
   | [...t/datafusion/src/optimizer/projection\_push\_down.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvcHJvamVjdGlvbl9wdXNoX2Rvd24ucnM=) | `97.70% <100.00%> (ø)` | |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `51.85% <100.00%> (+2.83%)` | :arrow_up: |
   | [rust/datafusion/tests/user\_defined\_plan.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3Rlc3RzL3VzZXJfZGVmaW5lZF9wbGFuLnJz) | `80.37% <100.00%> (+1.08%)` | :arrow_up: |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9333/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=footer). Last update [437c8c9...62f89eb](https://codecov.io/gh/apache/arrow/pull/9333?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org