You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "stuartcarnie (via GitHub)" <gi...@apache.org> on 2023/04/20 13:39:09 UTC
[GitHub] [arrow-datafusion] stuartcarnie opened a new issue, #6072: DataFusion produces incorrect results with certain filter expressions
stuartcarnie opened a new issue, #6072:
URL: https://github.com/apache/arrow-datafusion/issues/6072
### Describe the bug
Predicates which combine multiple `true` literals using a disjunction operator followed by a conjunction are incorrectly simplified, causing incorrect results.
For example, executing the following query:
```sql
WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE (time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0');
```
Produces the following, incorrect results, as the row containing `cpu1` for the column `cpu` should not be included:
```
+------+------+
| time | cpu |
+------+------+
| 0 | cpu0 |
| 1 | cpu1 |
+------+------+
2 rows in set. Query took 0.004 seconds.
```
### To Reproduce
```sql
WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE (time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0')
```
### Expected behavior
Produce the following results:
```
+------+------+
| time | cpu |
+------+------+
| 0 | cpu0 |
+------+------+
1 row in set. Query took 0.004 seconds.
```
### Additional context
Running `EXPLAIN VERBOSE` indicates the `simplify_expressions` rule is incorrectly rewriting the filter node:
```text
❯ explain verbose WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE (time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0');
+------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| plan_type | plan |
+------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
| initial_logical_plan | Projection: time, cpu |
| | Filter: (time = Int64(0) OR time = Int64(1)) AND (Boolean(true) OR Boolean(true) AND cpu = Utf8("cpu0")) |
| | Projection: t.column1 AS time, t.column2 AS cpu |
| | SubqueryAlias: t |
| | Values: (Int64(0), Utf8("cpu0")), (Int64(1), Utf8("cpu1")) |
| logical_plan after inline_table_scan | SAME TEXT AS ABOVE |
| logical_plan after type_coercion | SAME TEXT AS ABOVE |
| logical_plan after simplify_expressions | Projection: time, cpu |
| | Filter: time = Int64(0) OR time = Int64(1) |
| | Projection: t.column1 AS time, t.column2 AS cpu |
| | SubqueryAlias: t |
| | Values: (Int64(0), Utf8("cpu0")), (Int64(1), Utf8("cpu1")) |
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] stuartcarnie commented on issue #6072: DataFusion produces incorrect results with certain filter expressions
Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie commented on issue #6072:
URL: https://github.com/apache/arrow-datafusion/issues/6072#issuecomment-1516419215
This isn't a bug, as its the correct precedence:
```
(true or true and cpu = 'cpu0')
```
is the same
```
(true or (true and cpu = 'cpu0'))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] stuartcarnie closed issue #6072: DataFusion produces incorrect results with certain filter expressions
Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie closed issue #6072: DataFusion produces incorrect results with certain filter expressions
URL: https://github.com/apache/arrow-datafusion/issues/6072
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org