You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/22 19:36:03 UTC

[GitHub] [arrow-datafusion] isidentical opened a new issue, #3929: [META] Improving cost calculations and cost based optimizations

isidentical opened a new issue, #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929

   This is a meta issue for improving cost calculations and cost-based optimizations in DataFusion. We already have some statistics collected (mainly from the table sources) and there are estimations for statistics by some of the execution plan nodes, and the overall idea is to improve these as well as possible CBOs.
   
   ### Main Goals
   - Have enough statistics to start nested join optimizations (#3843). This involves being able to guess the weight of a join side, and do global re-ordering between join sides to minimize the overall cost of parent joins by reducing the output as much as possible at the bottom levels.
   - Provide a more reliable static analysis phase for physical execution operators (so that range based pruning/predicate pruning can leverage the existing infrastructure on their implementations)
   - What else?
   
   ### Work in Progress
   
   - [ ] https://github.com/apache/arrow-datafusion/issues/3898
   - [ ] https://github.com/apache/arrow-datafusion/issues/3845
   - What else?
   
   ### Planned
   - [ ] Estimating join cardinalities when the underlying table does not have any statistics (https://github.com/apache/arrow-datafusion/issues/3813#issuecomment-1276643214).
   - What else?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #3929: [META] Improving cost calculations and cost based optimizations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929#issuecomment-1290854846

   I believe the next step is some sort of design document.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Dandandan commented on issue #3929: [META] Improving cost calculations and cost based optimizations

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929#issuecomment-1290960189

   Maybe you can share the doc publicly so anyone can do suggestions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #3929: [META] Improving cost calculations and cost based optimizations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929#issuecomment-1292676358

   I plan to review the doc carefully tomorrow ❤️ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on issue #3929: [META] Improving cost calculations and cost based optimizations

Posted by GitBox <gi...@apache.org>.
isidentical commented on issue #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929#issuecomment-1290870366

   I'd be happy to start one, and if anyone is interested I can also give write access (shoot me your google emails at `isidentical@gmail.com`).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on issue #3929: [META] Improving cost calculations and cost based optimizations

Posted by GitBox <gi...@apache.org>.
isidentical commented on issue #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929#issuecomment-1287893324

   @alamb @Dandandan @mingmwang I've created the meta/epic issue as we discussed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on issue #3929: [META] Improving cost calculations and cost based optimizations

Posted by GitBox <gi...@apache.org>.
isidentical commented on issue #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929#issuecomment-1292224231

   It should be publicly accessible now: https://docs.google.com/document/d/1M4mmV7KA1LSj-D-WJA338B4ydlm-8A8D5OPuDE5_SD4/ (also pinning this to the issue)
   
   It is an overall discovery of the stuff we are doing right now and how they can actually help us in the future (as well as some possible points) but it is in a very early stage. I'd be thrilled to hear about what you are thinking as well as potentially other unexplored areas).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] isidentical commented on issue #3929: [META] Improving cost calculations and cost based optimizations

Posted by GitBox <gi...@apache.org>.
isidentical commented on issue #3929:
URL: https://github.com/apache/arrow-datafusion/issues/3929#issuecomment-1292691642

   Thanks @alamb! I'll also try to talk a bit more about it with real-world examples in the [tomorrow's meeting](https://arrow.apache.org/datafusion/contributor-guide/communication.html#sync-up-video-calls) from scratch (if we would have the time for that in this meetup, and if I can actually make it there), just in case if anyone is also planning to attend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org