You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/02/04 02:34:00 UTC
[jira] [Created] (IMPALA-10473) Order by a constant should not be
ignored in row_number()
Quanlong Huang created IMPALA-10473:
---------------------------------------
Summary: Order by a constant should not be ignored in row_number()
Key: IMPALA-10473
URL: https://issues.apache.org/jira/browse/IMPALA-10473
Project: IMPALA
Issue Type: Bug
Reporter: Quanlong Huang
Assignee: Quanlong Huang
[~thundergun] found a bug that row_number() ordering by a constant get wrong results when there are more than one fragment instances:
{code:sql}
create table t1(c1 int) stored as textfile;
-- Insert 3 times to create 3 files
insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
-- Wrong plan missing a sort node after scan. Analytic is wrongly performed locally.
select row_number() over (order by '1') from t1;
+------------------------+
| row_number() OVER(...) |
+------------------------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+------------------------+
{code}
In the plan, we can find that ANALYTIC is placed in the fragment with SCAN. So row_number() is performed locally, which gets wrong results.
{code:java}
F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
| Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1
PLAN-ROOT SINK
| output exprs: row_number()
| mem-estimate=0B mem-reservation=0B thread-reservation=0
|
02:EXCHANGE [UNPARTITIONED]
| mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
| tuple-ids=0,2 row-size=8B cardinality=15
| in pipelines: 00(GETNEXT)
|
F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
Per-Host Resources: mem-estimate=36.00MB mem-reservation=4.01MB thread-reservation=2
01:ANALYTIC
| functions: row_number()
| order by: '1' ASC
| window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
| mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
| tuple-ids=0,2 row-size=8B cardinality=15
| in pipelines: 00(GETNEXT)
|
00:SCAN HDFS [default.t1, RANDOM]
HDFS partitions=1/1 files=3 size=60B
stored statistics:
table: rows=unavailable size=unavailable
columns: all
extrapolated-rows=disabled max-scan-range-rows=unavailable
mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1
tuple-ids=0 row-size=0B cardinality=15
in pipelines: 00(GETNEXT) {code}
This is a old issue since we have IMPALA-6323. IMPALA-6323 allows analytic functions to have a constant order by clause and they are always ignored. This causes analytic funcs being performed locally instead of globally and can cause incorrect results for some functions like row_number().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)