You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/29 10:00:24 UTC

[GitHub] [arrow-datafusion] liurenjie1024 commented on issue #2633: Introducing a new optimizer framework for datafusion.

liurenjie1024 commented on issue #2633:
URL: https://github.com/apache/arrow-datafusion/issues/2633#issuecomment-1169784027

   Hi, @alamb @andygrove I've finished a simple poc and you can find the code here: https://github.com/liurenjie1024/rust-opt-framework/tree/main/src/datafusion_poc
   
   Here are the general ideas:
   
   1.  To adopt new heuristic optimizer,  we can wrap `HeuristicOptimizer`  as a optimizer rule, and it works as following:
   ```
   Datafusion Logical Plan -> Our Logical Plan -> HeuristicOptimizer -> Our Logical Plan -> Datafusion Logical Plan
   ```
   You can find an implementation here:
   https://github.com/liurenjie1024/rust-opt-framework/blob/main/src/datafusion_poc/rule.rs
   
   2. To adopt new cascades style cost based optimizer, we can implement a new `QueryPlanner`, which works as following:
   ```
   Datafusion logical plan -> Our logical plan -> Cost based optimizer -> Our physical plan -> Datafusion physical plan
   ```
   You can find implementation here:
   https://github.com/liurenjie1024/rust-opt-framework/blob/main/src/datafusion_poc/planner.rs
   
   3. For robust behavior of cbo without statistics, I prefer to use trivial cost model. For example, add penalty for operators like sort, nest loop join, etc. Currently I don't have implementation for this, but I think the optimizer framework is flexible enough and we can add them later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org