You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bruce Robbins (Jira)" <ji...@apache.org> on 2022/11/08 01:57:00 UTC
[jira] [Created] (SPARK-41035) Incorrect results or NPE when a literal is reused across distinct aggregations

Bruce Robbins created SPARK-41035:
-------------------------------------

             Summary: Incorrect results or NPE when a literal is reused across distinct aggregations
                 Key: SPARK-41035
                 URL: https://issues.apache.org/jira/browse/SPARK-41035
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.3.1, 3.2.2, 3.4.0
            Reporter: Bruce Robbins


This query produces incorrect results:
{noformat}
select a, count(distinct 100) as cnt1, count(distinct b, 100) as cnt2
from values (1, 2), (4, 5) as data(a, b)
group by a;

+---+----+----+
|a  |cnt1|cnt2|
+---+----+----+
|1  |1   |0   |
|4  |1   |0   |
+---+----+----+
{noformat}
The values for {{cnt2}} should be 1 and 1 (not 0 and 0).

If you change the literal used in the first aggregate function, the second aggregate function now works correctly:
{noformat}
select a, count(distinct 101) as cnt1, count(distinct b, 100) as cnt2
from values (1, 2), (4, 5) as data(a, b)
group by a;

+---+----+----+
|a  |cnt1|cnt2|
+---+----+----+
|1  |1   |1   |
|4  |1   |1   |
+---+----+----+
{noformat}
The same bug causes the following query to get a NullPointerException:
{noformat}
select a, count(distinct 1), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
from values (1, 2), (4, 5) as data(a, b)
group by a;
{noformat}
If your change the literal used in the first aggregation, then the query succeeds:
{noformat}
select a, count(distinct 2), count_min_sketch(distinct b, 0.5d, 0.5d, 1)
from values (1, 2), (4, 5) as data(a, b)
group by a;

+---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|a  |count(DISTINCT 2)|count_min_sketch(DISTINCT b, 0.5, 0.5, 1)                                                                                                                                            |
+---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|1  |1                |[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
|4  |1                |[00 00 00 01 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 04 00 00 00 00 5D 8D 6A B9 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00]|
+---+-----------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org