You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/02/07 22:01:00 UTC
[jira] [Created] (SPARK-38133) Grouping by timestamp_ntz will sometimes corrupt the results
Bruce Robbins created SPARK-38133:
-------------------------------------
Summary: Grouping by timestamp_ntz will sometimes corrupt the results
Key: SPARK-38133
URL: https://issues.apache.org/jira/browse/SPARK-38133
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.3.0
Reporter: Bruce Robbins
Assume this data:
{noformat}
create or replace temp view v1 as
select * from values
(1, timestamp_ntz'2012-01-01 00:00:00', 10000),
(2, timestamp_ntz'2012-01-01 00:00:00', 20000),
(1, timestamp_ntz'2012-01-01 00:00:00', 5000),
(1, timestamp_ntz'2013-01-01 00:00:00', 48000),
(2, timestamp_ntz'2013-01-01 00:00:00', 30000)
as data(a, b, c);
{noformat}
Run the following query:
{noformat}
select *
from v1
pivot (
sum(c)
for a in (1, 2)
);
{noformat}
You get incorrect results for the group-by column:
{noformat}
2012-01-01 19:05:19.476736 15000 20000
2013-01-01 19:05:19.476736 48000 30000
Time taken: 2.65 seconds, Fetched 2 row(s)
{noformat}
Actually, _whenever_ the TungstenAggregationIterator is used to group by a timestamp_ntz column, you get incorrect results:
{noformat}
set spark.sql.codegen.wholeStage=false;
select a, b, sum(c) from v1 group by a, b;
{noformat}
This query produces
{noformat}
2 2012-01-01 09:32:39.738368 20000
1 2013-01-01 09:32:39.738368 48000
2 2013-01-01 09:32:39.738368 30000
Time taken: 1.927 seconds, Fetched 4 row(s)
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org