You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "milevin (via GitHub)" <gi...@apache.org> on 2023/03/22 18:15:29 UTC

[GitHub] [arrow-datafusion] milevin opened a new issue, #5694: Spurious cast to Uint8 inserted when coalesce is used in a context of a join

milevin opened a new issue, #5694:
URL: https://github.com/apache/arrow-datafusion/issues/5694

   ### Describe the bug
   
   Following is a valid (as far as I can tell) SQL that blows up in DataFusion:
   
   ```
   DataFusion CLI v20.0.0
   ❯ with customer as (
     select 11111 as c_custkey
   ), cust as (
     select case when c_custkey = 0 then null else c_custkey end as c_custkey from customer
   )
   select coalesce(a.c_custkey, b.c_custkey) as custkey
   from cust a inner join cust b
   on a.c_custkey = b.c_custkey;
   Arrow error: Cast error: Can't cast value 11111 to type UInt8
   ```
   
   The culprit seems to be a spurious cast to UInt8 that gets inserted in coalesce arguments:
   
   ```
   ❯ explain with customer as (
     select 11111 as c_custkey
   ), cust as (
     select case when c_custkey = 0 then null else c_custkey end as c_custkey from customer
   )
   select coalesce(a.c_custkey, b.c_custkey) as custkey
   from cust a inner join cust b
   on a.c_custkey = b.c_custkey;
   +---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                                                                                                         |
   +---------------+----------------------------------------------------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: coalesce(CAST(a.c_custkey AS UInt8), CAST(b.c_custkey AS UInt8)) AS custkey                                                      |
   |               |   Inner Join: a.c_custkey = b.c_custkey                                                                                                      |
   |               |     SubqueryAlias: a                                                                                                                         |
   |               |       SubqueryAlias: cust                                                                                                                    |
   |               |         Projection: CASE WHEN customer.c_custkey = Int64(0) THEN Int64(NULL) ELSE customer.c_custkey END AS c_custkey                        |
   |               |           SubqueryAlias: customer                                                                                                            |
   |               |             Projection: Int64(11111) AS c_custkey                                                                                            |
   |               |               EmptyRelation                                                                                                                  |
   |               |     SubqueryAlias: b                                                                                                                         |
   |               |       SubqueryAlias: cust                                                                                                                    |
   |               |         Projection: CASE WHEN customer.c_custkey = Int64(0) THEN Int64(NULL) ELSE customer.c_custkey END AS c_custkey                        |
   |               |           SubqueryAlias: customer                                                                                                            |
   |               |             Projection: Int64(11111) AS c_custkey                                                                                            |
   |               |               EmptyRelation                                                                                                                  |
   ...
   ```
   
   ### To Reproduce
   
   See above repro in the CLI
   
   ### Expected behavior
   
   The query above is expected to succeed
   
   ### Additional context
   
   Any simplification to the above query I tried (e.g. removing join, case, or coalesce) makes the issue go away.
   
   E.g. here is a query where b.custkey is replaced by 0 in the coalesce call:
   
   ```
   ❯ with customer as (
     select 11111 as c_custkey
   ), cust as (
     select case when c_custkey = 0 then null else c_custkey end as c_custkey from customer
   )
   select coalesce(a.c_custkey, 0) as custkey
   from cust a inner join cust b
   on a.c_custkey = b.c_custkey;
   +---------+
   | custkey |
   +---------+
   | 11111   |
   +---------+
   1 row in set. Query took 0.021 seconds.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org