You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jiangzhx (via GitHub)" <gi...@apache.org> on 2023/03/30 04:23:11 UTC

[GitHub] [arrow-datafusion] jiangzhx opened a new issue, #5789: This feature is not implemented: Physical plan does not support logical expression EXISTS ()

jiangzhx opened a new issue, #5789:
URL: https://github.com/apache/arrow-datafusion/issues/5789

   ### Describe the bug
   
   DataFusion CLI v21.0.0
   
   ```
   CREATE EXTERNAL TABLE t1 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
   CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
   SELECT a, b FROM t1 WHERE EXISTS (SELECT count(*) FROM t2);
   ```
   return:
   This feature is not implemented: Physical plan does not support logical expression EXISTS (<subquery>)
   
   DataFusion CLI v19.0.0  worked.
   
   
   ### To Reproduce
   
   create data.csv
   
   `echo "1,2" > data.csv`
   
   use datafusion-cli
   ```
   
   CREATE EXTERNAL TABLE t1 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
   CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
   SELECT a, b FROM t1 WHERE EXISTS (SELECT count(*) FROM t2);
   
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #5789: This feature is not implemented: Physical plan does not support logical expression EXISTS ()

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5789:
URL: https://github.com/apache/arrow-datafusion/issues/5789#issuecomment-1490318113

   I agree that https://github.com/apache/arrow-datafusion/pull/5419 likely caused this issue.
   
   I think the fix is to simply ignore such errors when creating pruning predicates (aka when the predicate is not supported). 
   
   What do you think @crepererum ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5789: This feature is not implemented: Physical plan does not support logical expression EXISTS ()

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5789:
URL: https://github.com/apache/arrow-datafusion/issues/5789#issuecomment-1489852155

   i'm not sure,does pr this https://github.com/apache/arrow-datafusion/pull/5419  cause this issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx closed issue #5789: This feature is not implemented: Physical plan does not support logical expression EXISTS ()

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx closed issue #5789: This feature is not implemented: Physical plan does not support logical expression EXISTS (<subquery>)
URL: https://github.com/apache/arrow-datafusion/issues/5789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jiangzhx commented on issue #5789: SQL case, This feature is not implemented: Physical plan does not support logical expression EXISTS ()

Posted by "jiangzhx (via GitHub)" <gi...@apache.org>.
jiangzhx commented on issue #5789:
URL: https://github.com/apache/arrow-datafusion/issues/5789#issuecomment-1498753570

   @crepererum I think you're right, just your PR  let this issue expose.
   
   these Exists, InSubquery, ScalarSubquery Expr did not complete the corresponding processing 
   in the create_physical_expr method of planner.rs.
   https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/physical-expr/src/planner.rs#L501-L503
   
   Before your PR takes effect, it also does not actually handle the subquery scene at datasource 
   https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/core/src/datasource/file_format/mod.rs#L83-L89
   
   so, i think the way is to optimize non-correlated subquery at decorrelate_where_exists optimizer
   https://github.com/apache/arrow-datafusion/blob/a1c60a1ba98e089d7551637f2a78663e66772d88/datafusion/optimizer/src/decorrelate_where_exists.rs#L185-L191
   
   the current decorrelate_where_exists only optimize
   ```
   SELECT t1.id FRO
   WHERE exists
   (
      SELECT t2.id 
   )
   /// and optimizes it into:
   SELECT t1.id
   FROM t1 LEFT SEM
   JOIN t2
   ON t1.id = t2.id
   ```
   may be we need add more rules to decorrelate_where_exists
   ```
   ///Rewrite non correlated exists subquery to use ScalarSubquery
   WHERE EXISTS (SELECT A FROM TABLE B WHERE COL1 > 10)
   ///will be rewritten to
   WHERE (SELECT 1 FROM (SELECT A FROM TABLE B WHERE COL1 > 10) LIMIT 1) IS NOT NULL
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] crepererum commented on issue #5789: SQL case, This feature is not implemented: Physical plan does not support logical expression EXISTS ()

Posted by "crepererum (via GitHub)" <gi...@apache.org>.
crepererum commented on issue #5789:
URL: https://github.com/apache/arrow-datafusion/issues/5789#issuecomment-1491531423

   To me this looks like a bug. Who's trying to push down / apply a sub-query predicate to a parquet file read? Shouldn't the logical optimizer remove these kind of expressions?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] SQL case, This feature is not implemented: Physical plan does not support logical expression EXISTS () [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #5789:
URL: https://github.com/apache/arrow-datafusion/issues/5789#issuecomment-1924686232

   For the record this still happens:
   
   ```
   (venv-310) andrewlamb@Andrews-MacBook-Pro:~/Downloads$ datafusion-cli
   DataFusion CLI v35.0.0
   ❯ CREATE EXTERNAL TABLE t1 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
   
   0 rows in set. Query took 0.030 seconds.
   
   ❯ CREATE EXTERNAL TABLE t2 (a INT, b INT) STORED AS CSV LOCATION 'data.csv';
   0 rows in set. Query took 0.001 seconds.
   
   ❯ SELECT a, b FROM t1 WHERE EXISTS (SELECT count(*) FROM t2);
   
   This feature is not implemented: Physical plan does not support logical expression Exists(Exists { subquery: <subquery>, negated: false })
   ❯
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org