You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Aman Sinha (Jira)" <ji...@apache.org> on 2021/11/19 06:39:00 UTC
[jira] [Created] (IMPALA-11030) Wrong result due to predicate pushdown into inline view with Analytic function

Aman Sinha created IMPALA-11030:
-----------------------------------

             Summary: Wrong result due to predicate pushdown into inline view with Analytic function
                 Key: IMPALA-11030
                 URL: https://issues.apache.org/jira/browse/IMPALA-11030
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 3.4.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha


Table DDL and population:
{noformat}
create table t1( c1 int, c2 char(1));
insert into t1 values (1,cast('P' as char(1))),(2,cast('P' as char(1))),(3,cast('P' as char(1))),(4,cast('N' as char(1))),(5,cast('P' as char(1))),(6, cast('N' as char(1))),(7, cast('P' as char(1))),(8, cast('N' as char(1))),(9, cast('N' as char(1))),(10, cast('N' as char(1))), (11,cast('P' as char(1))),(12,cast('N' as char(1))),(13,cast('P' as char(1))),(14,cast('N' as char(1))),(15,cast('N' as char(1))),(16, cast('N' as char(1))),(17, cast('P' as char(1))),(18, cast('N' as char(1))),(19, cast('P' as char(1))),(20, cast('N' as char(1)));
{noformat}

{noformat}
default> select * from t1;
------+

c1	c2
------+

11	P
12	N
13	P
14	N
15	N
16	N
17	P
18	N
19	P
20	N
1	P
2	P
3	P
4	N
5	P
6	N
7	P
8	N
9	N
10	N
------+

The following query produces a wrong num_row() for num_ranks column.

default> select * from (select c1, c2 , row_number() over(order by c1) as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from t1) a where c2='P';

-------------------------+

c1	c2	num_ranks	prime_rank
-------------------------+

1	P	1	1
2	P	2	2
3	P	3	3
5	P	4	4
7	P	5	5
11	P	6	6
13	P	7	7
17	P	8	8
19	P	9	9
-------------------------+

{noformat}

The plan indicates that the predicate c2='P' is incorrectly pushed to the scan and affects the order of operations in the SQL statement.

{noformat}
Query: explain select * from (select c1, c2 , row_number() over(order by c1) as num_ranks, row_number() over( partition by c2 order by c1) as prime_rank from t1) a where c2='P'

+------------------------------------------------------------------------------------------+
| Explain String                                                                           |
+------------------------------------------------------------------------------------------+
| Max Per-Host Resource Reservation: Memory=20.00MB Threads=2                              |
| Per-Host Resource Estimates: Memory=30MB                                                 |
| Codegen disabled by planner                                                              |
| Analyzed query: SELECT * FROM (SELECT c1, c2, row_number() OVER (ORDER BY c1             |
| ASC) num_ranks, row_number() OVER (PARTITION BY c2 ORDER BY c1 ASC) prime_rank           |
| FROM `default`.t1) a WHERE CAST(c2 AS STRING) = 'P'                                      |
|                                                                                          |
| F00:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1                                    |
| |  Per-Host Resources: mem-estimate=30.00MB mem-reservation=20.00MB thread-reservation=2 |
| PLAN-ROOT SINK                                                                           |
| |  output exprs: c1, c2, row_number(), row_number()                                      |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
| |                                                                                        |
| 04:ANALYTIC                                                                              |
| |  functions: row_number()                                                               |
| |  order by: c1 ASC                                                                      |
| |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                              |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
| |  tuple-ids=7,3 row-size=21B cardinality=10                                             |
| |  in pipelines: 03(GETNEXT)                                                             |
| |                                                                                        |
| 03:SORT                                                                                  |
| |  order by: c1 ASC                                                                      |
| |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0   |
| |  tuple-ids=7 row-size=13B cardinality=10                                               |
| |  in pipelines: 03(GETNEXT), 01(OPEN)                                                   |
| |                                                                                        |
| 02:ANALYTIC                                                                              |
| |  functions: row_number()                                                               |
| |  partition by: c2                                                                      |
| |  order by: c1 ASC                                                                      |
| |  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW                              |
| |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0   |
| |  tuple-ids=5,4 row-size=13B cardinality=10                                             |
| |  in pipelines: 01(GETNEXT)                                                             |
| |                                                                                        |
| 01:SORT                                                                                  |
| |  order by: c2 ASC NULLS LAST, c1 ASC                                                   |
| |  mem-estimate=6.00MB mem-reservation=6.00MB spill-buffer=2.00MB thread-reservation=0   |
| |  tuple-ids=5 row-size=5B cardinality=10                                                |
| |  in pipelines: 01(GETNEXT), 00(OPEN)                                                   |
| |                                                                                        |
| 00:SCAN HDFS [default.t1]                                                                |
|    HDFS partitions=1/1 files=2 size=91B                                                  |
|    predicates: CAST(default.t1.c2 AS STRING) = 'P'                                       |
|    stored statistics:                                                                    |
|      table: rows=20 size=91B                                                             |
|      columns: all                                                                        |
|    extrapolated-rows=disabled max-scan-range-rows=10                                     |
|    mem-estimate=16.00MB mem-reservation=8.00KB thread-reservation=1                      |
|    tuple-ids=0 row-size=5B cardinality=10                                                |
|    in pipelines: 00(GETNEXT)                                                             |
+------------------------------------------------------------------------------------------+
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org