You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/28 18:50:07 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #4006: Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:"

alamb opened a new issue, #4006:
URL: https://github.com/apache/arrow-datafusion/issues/4006

   **Describe the bug**
   DataFusion gets different answers when parquet pushdown is enabled
   
   NOTE that pushdown filtering is not enabled by default (as we are still working on it) so this issue will not likely affect users:
   
   **To Reproduce**
   1. Download data from 
   [repro.zip](https://github.com/apache/arrow-datafusion/files/9890904/repro.zip)
   2. Run datafusion CLI 
   
   The query run is
   ```sql
   select count(*) from foo where request_duration_ns > 791684060 OR client_addr NOT in ('213.120.214.213');
   ```
   
   **Expected behavior**
   Same answer should be produced with and without row  filtering enabled. However, with row filtering an error is produced
   
   ```shell
   datafusion-cli -f script.sql 
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 53819           |
   +-----------------+
   1 row in set. Query took 0.006 seconds.
   ```
   
   With it enabled:
   
   ```shell
   DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true datafusion-cli -f script.sql 
   ...
   1 row in set. Query took 0.002 seconds.
   ArrowError(ExternalError(Execution("Arrow error: External error: Arrow: underlying Arrow error: Compute error: Error evaluating filter predicate: Internal(\"Cannot evaluate binary expression Gt with types Utf8 and Int32\")")))
   ```
   
   **Additional context**
   Found by the test here https://github.com/apache/arrow-datafusion/pull/3976


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on issue #4006: Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:"

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #4006:
URL: https://github.com/apache/arrow-datafusion/issues/4006#issuecomment-1295950696

   It looks like this has the same underlying cause as https://github.com/apache/arrow-datafusion/issues/4005#issuecomment-1295949956
   
   Reordering the predicates works
   
   ```
   ❯ select count(*) from foo where client_addr NOT in ('213.120.214.213') OR request_duration_ns > 791684060;
   +-----------------+
   | COUNT(UInt8(1)) |
   +-----------------+
   | 53819           |
   +-----------------+
   1 row in set. Query took 0.247 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold closed issue #4006: Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:"

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #4006: Internal error when parquet predicate pushdown is enabled "Error evaluating filter predicate:"
URL: https://github.com/apache/arrow-datafusion/issues/4006


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org