You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/15 18:57:15 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new issue, #3843: Implement nested join optimization

Dandandan opened a new issue, #3843:
URL: https://github.com/apache/arrow-datafusion/issues/3843

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   For complex queries, like those in TCP-H and TCP-DS it is essential to find a good Join order.
   `HashBuildProbeOrder` implements a rule to optimize the probe / build side of joins, but this is only a `local` optimization.
   
   We should implement an algorithm that (tries to) find a more global optimum based on the total estimated cost of the joins.
   
   **Describe the solution you'd like**
   Implement an efficient algorithm for optimizing 
   I'm not sure what the SOTA is on this. Some material I found with some Googling:
   
   https://db.in.tum.de/teaching/ws1415/queryopt/chapter3.pdf
   https://db.in.tum.de/~radke/papers/hugejoins.pdf
   https://www.cockroachlabs.com/blog/join-ordering-pt1/
   https://www.cockroachlabs.com/blog/join-ordering-ii-the-ikkbz-algorithm/
   http://mlwiki.org/index.php/Join_Ordering
   
   **Describe alternatives you've considered**
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] cristian-ilies-vasile commented on issue #3843: Implement nested join optimization

Posted by "cristian-ilies-vasile (via GitHub)" <gi...@apache.org>.
cristian-ilies-vasile commented on issue #3843:
URL: https://github.com/apache/arrow-datafusion/issues/3843#issuecomment-1517378497

   Other resources:
   Simplicity Done Right for Join Ordering - https://www.cidrdb.org/cidr2021/papers/cidr2021_paper01.pdf
   The MonetDB Architecture Martin Kersten CWI - https://homepages.cwi.nl/~manegold/teaching/adt/lectures/lecture2.pdf


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #3843: Implement nested join optimization

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #3843:
URL: https://github.com/apache/arrow-datafusion/issues/3843#issuecomment-1293834929

   We could also look at DuckDB join reordering: https://www.youtube.com/watch?v=aNRoR0Z3SzU
   
   I filed a duplicate issue before I saw this one, although mine is specifically for the logical plan. https://github.com/apache/arrow-datafusion/issues/3984
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org