You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "MichaelScofield (via GitHub)" <gi...@apache.org> on 2023/03/09 11:26:26 UTC

[GitHub] [arrow-datafusion] MichaelScofield opened a new issue, #5529: Failed to execute sql with subquery

MichaelScofield opened a new issue, #5529:
URL: https://github.com/apache/arrow-datafusion/issues/5529

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   Datafusion failed to execute the sql that has its subquery wrapped in two or more parentheses. See the reproduce.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   In datafusion-cli:
   
   ```text
   DataFusion CLI v19.0.0
   ❯ CREATE EXTERNAL TABLE foo (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
   0 rows in set. Query took 0.009 seconds.
   ❯ SELECT * FROM foo WHERE a IN ((SELECT a FROM foo));
   This feature is not implemented: Physical plan does not support logical expression (<subquery>)
   ❯ explain SELECT * FROM foo WHERE a IN ((SELECT a FROM foo));
   +--------------+----------------------------------------------------------------------------+
   | plan_type    | plan                                                                       |
   +--------------+----------------------------------------------------------------------------+
   | logical_plan | Filter: foo.a = (<subquery>)                                               |
   |              |   Subquery:                                                                |
   |              |     Projection: foo.a                                                      |
   |              |       TableScan: foo                                                       |
   |              |   TableScan: foo projection=[a, b], partial_filters=[foo.a = (<subquery>)] |
   |              |     Subquery:                                                              |
   |              |       Projection: foo.a                                                    |
   |              |         TableScan: foo                                                     |
   +--------------+----------------------------------------------------------------------------+
   1 row in set. Query took 0.008 seconds.
   ```
   
   If the subquery is wrapped in "(( ))" instead of "( )", datafusion failed with message "`This feature is not implemented: Physical plan does not support logical expression (<subquery>)`".
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   Both mysql and postgresql can execute that sql. And when subquery is wrapped in single pair of parenthesis, the sql got executed correctly in datafusion, too.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   
   I'm a little hesitate to call it a bug. Clearly the subquery logical plan node is not supported in physical plan, yet. However, after some investigation on the codes, I think we can come up with a walkaround in the mean time.
   
   It's because the subquery "`where a in (( select a from foo ))`" got interpreted to `Expr::InList`, while the one with "( )" is `Expr::InSubquery`. Datafusion optimizes the `Expr::InSubquery` in `DecorrelateWhereIn` to join, effectively not producing any subqueries that physical plan cannot run.
   
   So the problem could be solved if we can make this specific "(( ))" case interpreted as `Expr::InSubquery`. That might be easy to be done in `ExprRewriter::mutate`:
   
   https://github.com/apache/arrow-datafusion/blob/ff013e24559850541c54cf5d254922649ef37d8c/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L395-L399
   
   WDYT? @alamb @waynexia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] MichaelScofield commented on issue #5529: Failed to execute sql with subquery

Posted by "MichaelScofield (via GitHub)" <gi...@apache.org>.
MichaelScofield commented on issue #5529:
URL: https://github.com/apache/arrow-datafusion/issues/5529#issuecomment-1463402334

   I've created the fix in https://github.com/apache/arrow-datafusion/pull/5542, PTAL @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5529: Failed to execute sql with subquery

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5529:
URL: https://github.com/apache/arrow-datafusion/issues/5529#issuecomment-1462422427

   > So the problem could be solved if we can make this specific "(( ))" case interpreted as Expr::InSubquery. That might be easy to be done in ExprRewriter::mutate:
   
   I agree that rewriting `a IN ((...))` if the content is a single subquery makes sense to me
   
   Maybe we jus need to update the check  https://github.com/apache/arrow-datafusion/blob/ff013e24559850541c54cf5d254922649ef37d8c/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#LL407
   
   To also allow subqueries


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #5529: Failed to execute sql with subquery

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #5529: Failed to execute sql with subquery
URL: https://github.com/apache/arrow-datafusion/issues/5529


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org