You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/11/08 01:57:00 UTC
[jira] [Created] (SPARK-41035) Incorrect results or NPE when a literal is reused across distinct aggregations
Bruce Robbins created SPARK-41035:
-------------------------------------
Summary: Incorrect results or NPE when a literal is reused across distinct aggregations
Key: SPARK-41035
URL: https://issues.apache.org/jira/browse/SPARK-41035
Project: Spark
Issue Type: Bug
Components: SQL
Affects Versions: 3.3.1, 3.2.2, 3.4.0
Reporter: Bruce Robbins
This query produces incorrect results:
{noformat}
select a, count(distinct 100) as cnt1, count(distinct b, 100) as cnt2
from values (1, 2), (4, 5) as data(a, b)
group by a;
+---+----+----+
|a |cnt1|cnt2|
+---+----+----+
|1 |1 |0 |
|4 |1 |0 |
+---+----+----+
{noformat}
The values for {{cnt2}} should be 1 and 1 (not 0 and 0).
If you change the literal used in the first aggregate function, the second aggregate function now works correctly:
{noformat}
select a, count(distinct 101) as cnt1, count(distinct b, 100) as cnt2
from values (1, 2), (4, 5) as data(a, b)
group by a;
+---+----+----+
|a |cnt1|cnt2|
+---+----+----+
|1 |1 |1 |
|4 |1 |1 |
+---+----+----+
{noformat}
The same bug causes the following query to get a NullPointerException:
{noformat}
select a, count(distinct 1), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
from values (1, 2), (4, 5) as data(a, b)
group by a;
{noformat}
If your change the literal used in the first aggregation, then the query succeeds:
{noformat}
select a, count(distinct 2), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
from values (1, 2), (4, 5) as data(a, b)
group by a;
+---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|a |count(DISTINCT 2)|count_min_sketch(DISTINCT b, 0.5, 0.5, 1) |
+---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1 |1 |[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
|4 |1 |[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
+---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org