You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tv42 (via GitHub)" <gi...@apache.org> on 2023/12/08 22:47:58 UTC

[I] CASE WHEN doesn't pick first true branch? Differs from SQLite/Postgres [arrow-datafusion]

tv42 opened a new issue, #8475:
URL: https://github.com/apache/arrow-datafusion/issues/8475

   ### Describe the bug
   
   I found this in the sqlite sqllogictest repository, `random/expr/slt_good_3.slt`:
   
   ```
   query I rowsort label-676
   SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59
   ----
   -337914
   ```
   
   Just to confirm with sqlite3:
   
   ```
   $ sqlite3
   SQLite version 3.43.2 2023-10-10 12:14:04
   Enter ".help" for usage hints.
   Connected to a transient in-memory database.
   Use ".open FILENAME" to reopen on a persistent database.
   sqlite> SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;
   -337914
   sqlite>
   ```
   
   Datafusion disagrees:
   
   ```
   $ datafusion-cli
   DataFusion CLI v33.0.0
   ❯ SELECT ALL - CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;
   +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | (- CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END) + Int64(93) + (- Int64(-44)) * Int64(91) + CASE Int64(44) WHEN (- Int64(-21)) * Int64(69) - Int64(12) THEN Int64(58) ELSE Int64(-3) END * Int64(23) * Int64(84) * (- Int64(-59)) |
   +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | -337868                                                                                                                                                                                                                                                                           |
   +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row in set. Query took 0.017 seconds.
   
   ❯
   ```
   
   With SQLite, the expression simplifies to
   
   ```
   SELECT - 47 + 93 + - - 44 * 91 + CASE + 44 WHEN - - 21 * 69 - 12 THEN 58 ELSE - 3 END * + + 23 * + 84 * - - 59;
   ```
   
   and then
   
   ```
   SELECT - 47 + 93 + - - 44 * 91 + - 3 * + + 23 * + 84 * - - 59;
   ```
   
   which Datafusion agrees is -337914. Hence, Datafusion must have evaluated one of the cases differently!
   
   ```
   sqlite> SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
   47
   ```
   
   ```
   ❯ SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
   +----------------------------------------------------------------------------------------------+
   | CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END |
   +----------------------------------------------------------------------------------------------+
   | 1                                                                                            |
   +----------------------------------------------------------------------------------------------+
   1 row in set. Query took 0.015 seconds.
   ```
   
   Trying to isolate that further, SQLite thinks `NOT - AVG ( - 41 ) IS NULL` is `1`, Datafusion thinks it's `true` which is the same with types, and Datafusion handles that `CASE` in isolation just fine:
   
   ```
   sqlite> SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
   47
   sqlite> select NOT - AVG ( - 41 ) IS NULL;
   1
   sqlite> select case when NOT - AVG ( - 41 ) IS NULL then 'yes' else 'no' end;
   yes
   ```
   
   ```
   ❯ SELECT CASE WHEN NOT - AVG ( - 41 ) IS NULL THEN 47 WHEN NULL IS NULL THEN COUNT ( * ) END;
   +----------------------------------------------------------------------------------------------+
   | CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Int64(47) WHEN NULL IS NULL THEN COUNT(*) END |
   +----------------------------------------------------------------------------------------------+
   | 1                                                                                            |
   +----------------------------------------------------------------------------------------------+
   1 row in set. Query took 0.014 seconds.
   
   ❯ select NOT - AVG ( - 41 ) IS NULL;
   +---------------------------------+
   | NOT (- AVG(Int64(-41))) IS NULL |
   +---------------------------------+
   | true                            |
   +---------------------------------+
   1 row in set. Query took 0.011 seconds.
   
   ❯ select case when NOT - AVG ( - 41 ) IS NULL then 'yes' else 'no' end;
   +--------------------------------------------------------------------------------+
   | CASE WHEN NOT (- AVG(Int64(-41))) IS NULL THEN Utf8("yes") ELSE Utf8("no") END |
   +--------------------------------------------------------------------------------+
   | yes                                                                            |
   +--------------------------------------------------------------------------------+
   1 row in set. Query took 0.014 seconds.
   ```
   
   But when it's part of that larger statement something goes wrong.
   
   And this is it:
   
   ```
   SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN NULL IS NULL THEN 'datafusion' END;
   sqlite and postgres
   ```
   
   ```
   ❯ SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN NULL IS NULL THEN 'datafusion' END;
   +--------------------------------------------------------------------------------------------------------+
   | CASE WHEN Boolean(true) THEN Utf8("sqlite and postgres") WHEN NULL IS NULL THEN Utf8("datafusion") END |
   +--------------------------------------------------------------------------------------------------------+
   | datafusion                                                                                             |
   +--------------------------------------------------------------------------------------------------------+
   1 row in set. Query took 0.005 seconds.
   ```
   
   
   ### To Reproduce
   
   ```
   ❯ SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN NULL IS NULL THEN 'datafusion' END;
   +--------------------------------------------------------------------------------------------------------+
   | CASE WHEN Boolean(true) THEN Utf8("sqlite and postgres") WHEN NULL IS NULL THEN Utf8("datafusion") END |
   +--------------------------------------------------------------------------------------------------------+
   | datafusion                                                                                             |
   +--------------------------------------------------------------------------------------------------------+
   1 row in set. Query took 0.005 seconds.
   ```
   
   ### Expected behavior
   
   ```
   sqlite> SELECT CASE WHEN true THEN 'sqlite and postgres' WHEN NULL IS NULL THEN 'datafusion' END;
   sqlite and postgres
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] CASE WHEN doesn't pick first true branch? Differs from SQLite/Postgres [arrow-datafusion]

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #8475: CASE WHEN doesn't pick first true branch? Differs from SQLite/Postgres
URL: https://github.com/apache/arrow-datafusion/issues/8475


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org