You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2024/03/20 13:26:19 UTC

[PR] WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses [arrow-datafusion]

alamb opened a new pull request, #9708:
URL: https://github.com/apache/arrow-datafusion/pull/9708

   ## Which issue does this PR close?
   
   Related to https://github.com/apache/arrow-datafusion/issues/9637
   
   ## Rationale for this change
   
   @jayzhan211  @mustafasrepo  and I have been discussing how to speed up planning by avoiding copying Exprs  / LogicalPlans
   
   
   ## What changes are included in this PR?
   
   This PR is a prototype of a suggestion here https://github.com/apache/arrow-datafusion/issues/9637#issuecomment-2002396168 to see if I can get a significant boost in Expr simplify
   
   The theory is that if I can rerite the optimizer to take ownership of the plans they can be rewritten in place
   
   If this works out, I'll turn this into an actual PR for review. 
   
   ## Are these changes tested?
   
   NA
   ## Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #9708:
URL: https://github.com/apache/arrow-datafusion/pull/9708#issuecomment-2016824822

   Continuing in https://github.com/apache/arrow-datafusion/pull/9780, so closing this one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #9708:
URL: https://github.com/apache/arrow-datafusion/pull/9708#issuecomment-2012196911

   Update: I think this PR now works well enough to show this is a promising approach. Specifically, by just fixing SimplifyExprs to not copy plans around, the planning benchmark goes 25% faster. I am pretty sure if we fix common subexpr eliminate and a few other passes we could get the benchmark running 2x as fast
   
   ```
   physical_plan_tpch_all  time:   [54.371 ms 54.512 ms 54.675 ms]
                           change: [-24.152% -23.333% -22.648%] (p = 0.00 < 0.05)
                           Performance has improved.
   ```
   
   The code in this PR is atrocious, but I think works well enough to demonstrate the idea
   
   The core problem is that when the LogicalPasses work, they copy the LogicalPlans many times (e.g. each call to `new_with_exprs` or `new_with_children` results in Copying the entire node and its embedded exprs (though not its children as they are `Arc`d)
   
   
   cc @sadboy  who I think has observed this before as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed pull request #9708: WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses
URL: https://github.com/apache/arrow-datafusion/pull/9708


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed pull request #9708: WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses
URL: https://github.com/apache/arrow-datafusion/pull/9708


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Avoid copying LogicalPlans / Exprs during OptimizerPasses [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #9708:
URL: https://github.com/apache/arrow-datafusion/pull/9708#discussion_r1532080163


##########
datafusion/optimizer/src/optimizer.rs:
##########
@@ -279,10 +294,10 @@ impl Optimizer {
     /// invoking observer function after each call
     pub fn optimize<F>(
         &self,
-        plan: &LogicalPlan,
+        plan: LogicalPlan,

Review Comment:
   rewriting the optimizer passes to take in owned `LogicalPlan` is the key idea
   
   I haven't plumbed it completely through yet, but the approach is looking promising



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org