You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/06/18 21:58:00 UTC
[jira] [Assigned] (IMPALA-10755) Wrong results for a query with
predicate on an analytic function
[ https://issues.apache.org/jira/browse/IMPALA-10755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha reassigned IMPALA-10755:
-----------------------------------
Assignee: Aman Sinha
> Wrong results for a query with predicate on an analytic function
> ----------------------------------------------------------------
>
> Key: IMPALA-10755
> URL: https://issues.apache.org/jira/browse/IMPALA-10755
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 4.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
> Priority: Critical
> Labels: wrongresults
>
> In the master branch, the following query with equality predicate on analytic function produces 13 rows whereas the correct result is 3 rows (as verified with Hive):
> {noformat}
> use tpch;
> select *
> FROM (
> SELECT
> l_quantity, l_linenumber,
> nr_pvp,
> max(nr_pvp) over (partition BY l_quantity) max_nr_pvp
> FROM (
> SELECT
> l_quantity, l_linenumber,
> count(1) AS nr_pvp
> FROM
> lineitem, partsupp
> where l_partkey = ps_partkey
> and ps_suppkey = 10 and l_quantity in (1, 2)
> GROUP BY l_quantity, l_linenumber
> ) a
> ) b
> WHERE nr_pvp = max_nr_pvp
> order by 1, 2, 3, 4;
> {noformat}
> Wrong results:
> {noformat}
> +------------+--------------+--------+------------+
> | l_quantity | l_linenumber | nr_pvp | max_nr_pvp |
> +------------+--------------+--------+------------+
> | 1.00 | 1 | 11 | 12 |
> | 1.00 | 2 | 12 | 12 |
> | 1.00 | 3 | 10 | 12 |
> | 1.00 | 4 | 5 | 12 |
> | 1.00 | 5 | 4 | 12 |
> | 1.00 | 6 | 5 | 12 |
> | 1.00 | 7 | 2 | 12 |
> | 2.00 | 1 | 12 | 12 |
> | 2.00 | 2 | 12 | 12 |
> | 2.00 | 3 | 11 | 12 |
> | 2.00 | 4 | 8 | 12 |
> | 2.00 | 5 | 3 | 12 |
> | 2.00 | 6 | 2 | 12 |
> +------------+--------------+--------+------------+
> Fetched 13 row(s) in 1.36s
> {noformat}
> Expected results (verified with Hive):
> {noformat}
> +---------------+-----------------+-----------+---------------+
> | b.l_quantity | b.l_linenumber | b.nr_pvp | b.max_nr_pvp |
> +---------------+-----------------+-----------+---------------+
> | 1.00 | 2 | 12 | 12 |
> | 2.00 | 1 | 12 | 12 |
> | 2.00 | 2 | 12 | 12 |
> +---------------+-----------------+-----------+---------------+
> {noformat}
> Explain plan for Impala shows that the equality predicate 'nr_pvp = max_nr_pvp' which is supposed to be applied on top of the analytic operator is missing:
> {noformat}
> +---------------------------------------------------------------------------+
> | Explain String |
> +---------------------------------------------------------------------------+
> | Max Per-Host Resource Reservation: Memory=42.88MB Threads=7 |
> | Per-Host Resource Estimates: Memory=483MB |
> | |
> | PLAN-ROOT SINK |
> | | |
> | 11:MERGING-EXCHANGE [UNPARTITIONED] |
> | | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
> | | |
> | 06:SORT |
> | | order by: l_quantity ASC, l_linenumber ASC, nr_pvp ASC, max_nr_pvp ASC |
> | | row-size=28B cardinality=357 |
> | | |
> | 05:ANALYTIC |
> | | functions: max(count(*)) |
> | | partition by: l_quantity |
> | | row-size=28B cardinality=357 |
> | | |
> | 04:SORT |
> | | order by: l_quantity ASC NULLS LAST |
> | | row-size=20B cardinality=357 |
> | | |
> | 10:EXCHANGE [HASH(l_quantity)] |
> | | |
> | 09:AGGREGATE [FINALIZE] |
> | | output: count:merge(*) |
> | | group by: l_quantity, l_linenumber |
> | | row-size=20B cardinality=357 |
> | | |
> | 08:EXCHANGE [HASH(l_quantity,l_linenumber)] |
> | | |
> | 03:AGGREGATE [STREAMING] |
> | | output: count(*) |
> | | group by: l_quantity, l_linenumber |
> | | row-size=20B cardinality=357 |
> | | |
> | 02:HASH JOIN [INNER JOIN, BROADCAST] |
> | | hash predicates: l_partkey = ps_partkey |
> | | runtime filters: RF000 <- ps_partkey |
> | | row-size=36B cardinality=2.45K |
> | | |
> | |--07:EXCHANGE [BROADCAST] |
> | | | |
> | | 01:SCAN HDFS [tpch.partsupp] |
> | | HDFS partitions=1/1 files=1 size=112.71MB |
> | | predicates: ps_suppkey = 10 |
> | | row-size=16B cardinality=82 |
> | | |
> | 00:SCAN HDFS [tpch.lineitem] |
> | HDFS partitions=1/1 files=1 size=718.94MB |
> | predicates: l_quantity IN (1, 2) |
> | runtime filters: RF000 -> l_partkey |
> | row-size=20B cardinality=235.34K |
> +---------------------------------------------------------------------------+
> {noformat}
> This is likely a regression since based on some internal testing this was working correctly on an earlier version.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org