You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/12 22:10:58 UTC

[GitHub] [arrow] westonpace commented on pull request #12841: ARROW-15526: [Python] Support for Dataset.join

westonpace commented on PR #12841:
URL: https://github.com/apache/arrow/pull/12841#issuecomment-1097278888

   > Does the join performance depend on which table is left or right?
   
   Yes.
   
   > I assume we don't have statistics or optimizer at this point which is different from databases and might confuse the end users.
   
   Correct.  We do not have these things and they can lead to easy creation of inefficient plans.
   
   Figuring out query optimization is a big question and I don't know anyone tackling it at the moment.  Calcite looks good but it's a JVM technology so it would be a challenge to integrate.  For now, my working assumption in the C++ has been to assume we are given an optimized plan and execute it as literally as possible.  One semi-related discussion is ARROW-16172 (trying to figure out if the C++ should implicitly cast or reject a plan)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org