You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "avantgardnerio (via GitHub)" <gi...@apache.org> on 2023/03/31 15:25:50 UTC

[GitHub] [arrow-datafusion] avantgardnerio commented on issue #5808: `decorrelate_where_in` reports error when optimizing `limit subquery`

avantgardnerio commented on issue #5808:
URL: https://github.com/apache/arrow-datafusion/issues/5808#issuecomment-1492124630

   > take a look if you have time
   
   Unfortunately my availability is low right now. If @mingmwang 's claim is correct (which I have no reason to doubt) that:
   
   ```
   SELECT t1.id, t1.name FROM t1 WHERE t1.id in (SELECT t2.id FROM t2 where t1.name = t2.name limit 10)
   ```
   
   > can not be de-correlated
   
   then I think we'll need to have the ability to execute plans even if this rule fails (i.e. nested loop execution). I don't think I ever intended it to decorrelate _all_ subqueries - it was designed to hit the 80% case and get TPC-H working.
   
   At the time, returning an error was considered the proper thing to do. The API changed so now the rule needs to be updated to plumb `Ok(None)` down through all the layers of recursion, which can be verbose and non-trivial.
   
   My recommendation at the time (which I would still assert) is that it would make the life of optimizer rule authors considerably simpler if we add a `DataFusionError::CanNotOptimize` error and simply return that in this case, which would get treated the same as `Ok(None)` so it keeps the code readable and simplifies plumbing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org