You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/06/18 19:02:00 UTC
[jira] [Created] (IMPALA-10755) Wrong results for a query with
predicate on an analytic function
Aman Sinha created IMPALA-10755:
-----------------------------------
Summary: Wrong results for a query with predicate on an analytic function
Key: IMPALA-10755
URL: https://issues.apache.org/jira/browse/IMPALA-10755
Project: IMPALA
Issue Type: Bug
Components: Frontend
Affects Versions: Impala 4.0
Reporter: Aman Sinha
In the master branch, the following query with equality predicate on analytic function produces 13 rows whereas the correct result is 3 rows (as verified with Hive):
{noformat}
use tpch;
select *
FROM (
SELECT
l_quantity, l_linenumber,
nr_pvp,
max(nr_pvp) over (partition BY l_quantity) max_nr_pvp
FROM (
SELECT
l_quantity, l_linenumber,
count(1) AS nr_pvp
FROM
lineitem, partsupp
where l_partkey = ps_partkey
and ps_suppkey = 10 and l_quantity in (1, 2)
GROUP BY l_quantity, l_linenumber
) a
) b
WHERE nr_pvp = max_nr_pvp
order by 1, 2, 3, 4;
{noformat}
Wrong results:
{noformat}
+------------+--------------+--------+------------+
| l_quantity | l_linenumber | nr_pvp | max_nr_pvp |
+------------+--------------+--------+------------+
| 1.00 | 1 | 11 | 12 |
| 1.00 | 2 | 12 | 12 |
| 1.00 | 3 | 10 | 12 |
| 1.00 | 4 | 5 | 12 |
| 1.00 | 5 | 4 | 12 |
| 1.00 | 6 | 5 | 12 |
| 1.00 | 7 | 2 | 12 |
| 2.00 | 1 | 12 | 12 |
| 2.00 | 2 | 12 | 12 |
| 2.00 | 3 | 11 | 12 |
| 2.00 | 4 | 8 | 12 |
| 2.00 | 5 | 3 | 12 |
| 2.00 | 6 | 2 | 12 |
+------------+--------------+--------+------------+
Fetched 13 row(s) in 1.36s
{noformat}
Expected results (verified with Hive):
{noformat}
+---------------+-----------------+-----------+---------------+
| b.l_quantity | b.l_linenumber | b.nr_pvp | b.max_nr_pvp |
+---------------+-----------------+-----------+---------------+
| 1.00 | 2 | 12 | 12 |
| 2.00 | 1 | 12 | 12 |
| 2.00 | 2 | 12 | 12 |
+---------------+-----------------+-----------+---------------+
{noformat}
Explain plan for Impala shows that the equality predicate 'nr_pvp = max_nr_pvp' which is supposed to be applied on top of the analytic operator is missing:
{noformat}
+---------------------------------------------------------------------------+
| Explain String |
+---------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=42.88MB Threads=7 |
| Per-Host Resource Estimates: Memory=483MB |
| |
| PLAN-ROOT SINK |
| | |
| 11:MERGING-EXCHANGE [UNPARTITIONED] |
| | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
| | |
| 06:SORT |
| | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
| | row-size=28B cardinality=357 |
| | |
| 05:ANALYTIC |
| | functions: max(count(*)) |
| | partition by: l_quantity |
| | row-size=28B cardinality=357 |
| | |
| 04:SORT |
| | order by: l_quantity ASC NULLS LAST |
| | row-size=20B cardinality=357 |
| | |
| 10:EXCHANGE [HASH(l_quantity)] |
| | |
| 09:AGGREGATE [FINALIZE] |
| | output: count:merge(*) |
| | group by: l_quantity, l_linenumber |
| | row-size=20B cardinality=357 |
| | |
| 08:EXCHANGE [HASH(l_quantity,l_linenumber)] |
| | |
| 03:AGGREGATE [STREAMING] |
| | output: count(*) |
| | group by: l_quantity, l_linenumber |
| | row-size=20B cardinality=357 |
| | |
| 02:HASH JOIN [INNER JOIN, BROADCAST] |
| | hash predicates: l_partkey = ps_partkey |
| | runtime filters: RF000 <- ps_partkey |
| | row-size=36B cardinality=2.45K |
| | |
| |--07:EXCHANGE [BROADCAST] |
| | | |
| | 01:SCAN HDFS [tpch.partsupp] |
| | HDFS partitions=1/1 files=1 size=112.71MB |
| | predicates: ps_suppkey = 10 |
| | row-size=16B cardinality=82 |
| | |
| 00:SCAN HDFS [tpch.lineitem] |
| HDFS partitions=1/1 files=1 size=718.94MB |
| predicates: l_quantity IN (1, 2) |
| runtime filters: RF000 -> l_partkey |
| row-size=20B cardinality=235.34K |
+---------------------------------------------------------------------------+
{noformat}
This is likely a regression since based on some internal testing this was working correctly on an earlier version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)