You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuming Wang (Jira)" <ji...@apache.org> on 2020/08/18 09:29:00 UTC
[jira] [Assigned] (SPARK-29274) Should not coerce decimal type to double type when it's join column

     [ https://issues.apache.org/jira/browse/SPARK-29274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yuming Wang reassigned SPARK-29274:
-----------------------------------

    Assignee:     (was: Pengfei Chang)

> Should not coerce decimal type to double type when it's join column
> -------------------------------------------------------------------
>
>                 Key: SPARK-29274
>                 URL: https://issues.apache.org/jira/browse/SPARK-29274
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.4, 2.4.4, 3.0.0
>            Reporter: Yuming Wang
>            Priority: Major
>         Attachments: image-2019-09-27-20-20-24-238.png
>
>
> How to reproduce this issue:
> {code:sql}
> create table t1 (incdata_id decimal(21,0), v string) using parquet;
> create table t2 (incdata_id string, v string) using parquet;
> explain select * from t1 join t2 on (t1.incdata_id = t2.incdata_id);
> == Physical Plan ==
> *(5) SortMergeJoin [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 as double)))], [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 as double)))], Inner
> :- *(2) Sort [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 as double))) ASC NULLS FIRST], false, 0
> :  +- Exchange hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#31 as double))), 200), true, [id=#104]
> :     +- *(1) Filter isnotnull(incdata_id#31)
> :        +- Scan hive default.t1 [incdata_id#31, v#32], HiveTableRelation `default`.`t1`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [incdata_id#31, v#32], Statistics(sizeInBytes=8.0 EiB)
> +- *(4) Sort [knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 as double))) ASC NULLS FIRST], false, 0
>    +- Exchange hashpartitioning(knownfloatingpointnormalized(normalizenanandzero(cast(incdata_id#33 as double))), 200), true, [id=#112]
>       +- *(3) Filter isnotnull(incdata_id#33)
>          +- Scan hive default.t2 [incdata_id#33, v#34], HiveTableRelation `default`.`t2`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, [incdata_id#33, v#34], Statistics(sizeInBytes=8.0 EiB)
> {code}
> {code:sql}
> select cast(v1 as double) as v3, cast(v2 as double) as v4,
>   cast(v1 as double) = cast(v2 as double), v1 = v2 
> from (select cast('100000000001636981212' as decimal(21, 0)) as v1,
>       cast('100000000001636981213' as decimal(21, 0)) as v2) t;
> 1.0000000000163697E20	1.0000000000163697E20	true	false
> {code}
>  
> It's a realy case in our production:
> !image-2019-09-27-20-20-24-238.png|width=100%!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org