You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "stuartcarnie (via GitHub)" <gi...@apache.org> on 2023/04/20 13:39:09 UTC

[GitHub] [arrow-datafusion] stuartcarnie opened a new issue, #6072: DataFusion produces incorrect results with certain filter expressions

stuartcarnie opened a new issue, #6072:
URL: https://github.com/apache/arrow-datafusion/issues/6072

   ### Describe the bug
   
   Predicates which combine multiple `true` literals using a disjunction operator followed by a conjunction are incorrectly simplified, causing incorrect results.
   
   For example, executing the following query:
   
   ```sql
   WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE (time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0');
   ```
   
   Produces the following, incorrect results, as the row containing `cpu1` for the column `cpu` should not be included:
   
   ```
   +------+------+
   | time | cpu  |
   +------+------+
   | 0    | cpu0 |
   | 1    | cpu1 |
   +------+------+
   2 rows in set. Query took 0.004 seconds.
   ```
   
   ### To Reproduce
   
   ```sql
   WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE (time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0')
   ```
   
   ### Expected behavior
   
   Produce the following results:
   
   ```
   +------+------+
   | time | cpu  |
   +------+------+
   | 0    | cpu0 |
   +------+------+
   1 row in set. Query took 0.004 seconds.
   ```
   
   ### Additional context
   
   Running `EXPLAIN VERBOSE` indicates the `simplify_expressions` rule is incorrectly rewriting the filter node:
   
   ```text
   ❯ explain verbose WITH t(time, cpu) AS (VALUES (0, 'cpu0'), (1, 'cpu1')) SELECT * from t WHERE (time = 0 OR time = 1) AND (true OR true AND cpu = 'cpu0');
   +------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
   | plan_type                                                  | plan                                                                                                       |
   +------------------------------------------------------------+------------------------------------------------------------------------------------------------------------+
   | initial_logical_plan                                       | Projection: time, cpu                                                                                      |
   |                                                            |   Filter: (time = Int64(0) OR time = Int64(1)) AND (Boolean(true) OR Boolean(true) AND cpu = Utf8("cpu0")) |
   |                                                            |     Projection: t.column1 AS time, t.column2 AS cpu                                                        |
   |                                                            |       SubqueryAlias: t                                                                                     |
   |                                                            |         Values: (Int64(0), Utf8("cpu0")), (Int64(1), Utf8("cpu1"))                                         |
   | logical_plan after inline_table_scan                       | SAME TEXT AS ABOVE                                                                                         |
   | logical_plan after type_coercion                           | SAME TEXT AS ABOVE                                                                                         |
   | logical_plan after simplify_expressions                    | Projection: time, cpu                                                                                      |
   |                                                            |   Filter: time = Int64(0) OR time = Int64(1)                                                               |
   |                                                            |     Projection: t.column1 AS time, t.column2 AS cpu                                                        |
   |                                                            |       SubqueryAlias: t                                                                                     |
   |                                                            |         Values: (Int64(0), Utf8("cpu0")), (Int64(1), Utf8("cpu1"))                                         |
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] stuartcarnie commented on issue #6072: DataFusion produces incorrect results with certain filter expressions

Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie commented on issue #6072:
URL: https://github.com/apache/arrow-datafusion/issues/6072#issuecomment-1516419215

   This isn't a bug, as its the correct precedence:
   
   ```
   (true or true and cpu = 'cpu0')
   ```
   
   is the same
   
   ```
   (true or (true and cpu = 'cpu0'))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] stuartcarnie closed issue #6072: DataFusion produces incorrect results with certain filter expressions

Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie closed issue #6072: DataFusion produces incorrect results with certain filter expressions
URL: https://github.com/apache/arrow-datafusion/issues/6072


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org