You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/05/10 03:21:00 UTC

[jira] [Created] (IMPALA-10697) NDV for rank() expression is incorrect

Aman Sinha created IMPALA-10697:
-----------------------------------

             Summary: NDV for rank() expression is incorrect
                 Key: IMPALA-10697
                 URL: https://issues.apache.org/jira/browse/IMPALA-10697
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
            Reporter: Aman Sinha


In the following query the cardinality of the final Aggregate is always 1 regardless of the cardinality of its child.  This is because the NDV of the analytic expr such as RANK seems to always be computed as 1 which is incorrect. 
{noformat}

Query: explain select rnk, count(*) from (
select * from
 (SELECT rank() OVER (ORDER BY ss_net_profit ASC) rnk
    FROM store_sales ss1
    WHERE ss_store_sk = 4) v1
where rnk < 1000) v2
group by rnk
+------------------------------------------------------------------------------------------+
| Explain String                                                                           |
+------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=13.94MB Threads=3                              |
| Per-Host Resource Estimates: Memory=142MB                                                |
| Analyzed query: SELECT rnk, count(*) FROM (SELECT * FROM (SELECT rank() OVER             |
| (ORDER BY ss_net_profit ASC) rnk FROM tpcds.store_sales ss1 WHERE ss_store_sk =          |
| CAST(4 AS INT)) v1 WHERE rnk < CAST(1000 AS BIGINT)) v2 GROUP BY rnk                     |
|                                                                                          |
| F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                    |
| |  Per-Host Resources: mem-estimate=14.01MB mem-reservation=5.94MB thread-reservation=1  |
| PLAN-ROOT SINK                                                                           |
| |  output exprs: rnk, count(*)                                                           |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
| |                                                                                        |
| 04:AGGREGATE [FINALIZE]                                                                  |
| |  output: count(*)                                                                      |
| |  group by: rank()                                                                      |
| |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0 |
| |  tuple-ids=5 row-size=16B cardinality=1                                                |
| |  in pipelines: 04(GETNEXT), 06(OPEN)                                                   |
| |                                                                                        |
| 03:SELECT                                                                                |
| |  predicates: rank() < CAST(1000 AS BIGINT)                                             |
| |  mem-estimate=0B mem-reservation=0B thread-reservation=0                               |
| |  tuple-ids=8,7 row-size=16B cardinality=999                                            |
| |  in pipelines: 06(GETNEXT)                                                             |
| |                                                                                        |
| 02:ANALYTIC                                                                              |
| |  functions: rank()                                                                     |
| |  order by: ss_net_profit ASC                                                           |
| |  window: RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                             |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
| |  tuple-ids=8,7 row-size=16B cardinality=999                                            |
| |  in pipelines: 06(GETNEXT)                                                             |
| |                                                                                        |
| 06:TOP-N                                                                                 |
| |  order by: ss_net_profit ASC                                                           |
| |  limit with ties: 999                                                                  |
| |  mem-estimate=7.80KB mem-reservation=0B thread-reservation=0                           |
| |  tuple-ids=8 row-size=8B cardinality=999                                               |
| |  in pipelines: 06(GETNEXT), 01(OPEN)                                                   |
| |                                                                                        |
| 05:EXCHANGE [UNPARTITIONED]                                                              |
| |  mem-estimate=37.72KB mem-reservation=0B thread-reservation=0                          |
| |  tuple-ids=8 row-size=8B cardinality=999                                               |
| |  in pipelines: 01(GETNEXT)                                                             |
| |                                                                                        |
| F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3                                           |
| Per-Host Resources: mem-estimate=128.01MB mem-reservation=8.00MB thread-reservation=2    |
| 01:TOP-N                                                                                 |
| |  order by: ss_net_profit ASC                                                           |
| |  limit with ties: 999                                                                  |
| |  source expr: rank() < CAST(1000 AS BIGINT)                                            |
| |  mem-estimate=7.80KB mem-reservation=0B thread-reservation=0                           |
| |  tuple-ids=8 row-size=8B cardinality=999                                               |
| |  in pipelines: 01(GETNEXT), 00(OPEN)                                                   |
| |                                                                                        |
| 00:SCAN HDFS [tpcds.store_sales ss1, RANDOM]                                             |
|    HDFS partitions=1824/1824 files=1824 size=346.60MB                                    |
|    predicates: ss_store_sk = CAST(4 AS INT)                                              |
|    stored statistics:                                                                    |
|      table: rows=2.88M size=346.60MB                                                     |
|      partitions: 1824/1824 rows=2.88M                                                    |
|      columns: all                                                                        |
|    extrapolated-rows=disabled max-scan-range-rows=130.09K                                |
|    mem-estimate=128.00MB mem-reservation=8.00MB thread-reservation=1                     |
|    tuple-ids=0 row-size=8B cardinality=480.07K                                           |
|    in pipelines: 00(GETNEXT)                                                             |
+------------------------------------------------------------------------------------------+
{noformat}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org