You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2021/02/05 12:25:00 UTC
[jira] [Resolved] (IMPALA-10473) Order by a constant should not be
ignored in row_number()
[ https://issues.apache.org/jira/browse/IMPALA-10473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Quanlong Huang resolved IMPALA-10473.
-------------------------------------
Fix Version/s: Impala 4.0
Resolution: Fixed
> Order by a constant should not be ignored in row_number()
> ---------------------------------------------------------
>
> Key: IMPALA-10473
> URL: https://issues.apache.org/jira/browse/IMPALA-10473
> Project: IMPALA
> Issue Type: Bug
> Components: Frontend
> Affects Versions: Impala 3.1.0, Impala 3.2.0, Impala 3.3.0, Impala 3.4.0
> Reporter: Quanlong Huang
> Assignee: Quanlong Huang
> Priority: Critical
> Labels: correctness
> Fix For: Impala 4.0
>
>
> [~thundergun] found a bug that row_number() ordering by a constant get wrong results when there are more than one fragment instances:
> {code:sql}
> create table t1(c1 int) stored as textfile;
> -- Insert 3 times to create 3 files
> insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
> insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
> insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1);
> -- Wrong plan missing a sort node after scan. Analytic is wrongly performed locally.
> set exec_single_node_rows_threshold=0;
> select row_number() over (order by '1') from t1;
> +------------------------+
> | row_number() OVER(...) |
> +------------------------+
> | 1 |
> | 2 |
> | 3 |
> | 4 |
> | 5 |
> | 6 |
> | 7 |
> | 8 |
> | 9 |
> | 10 |
> | 1 |
> | 2 |
> | 3 |
> | 4 |
> | 5 |
> | 6 |
> | 7 |
> | 8 |
> | 9 |
> | 10 |
> | 1 |
> | 2 |
> | 3 |
> | 4 |
> | 5 |
> | 6 |
> | 7 |
> | 8 |
> | 9 |
> | 10 |
> +------------------------+
> {code}
> In the plan, we can find that ANALYTIC is placed in the fragment with SCAN. So row_number() is performed locally, which gets wrong results.
> {code:java}
> F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
> | Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1
> PLAN-ROOT SINK
> | output exprs: row_number()
> | mem-estimate=0B mem-reservation=0B thread-reservation=0
> |
> 02:EXCHANGE [UNPARTITIONED]
> | mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
> | tuple-ids=0,2 row-size=8B cardinality=15
> | in pipelines: 00(GETNEXT)
> |
> F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
> Per-Host Resources: mem-estimate=36.00MB mem-reservation=4.01MB thread-reservation=2
> 01:ANALYTIC
> | functions: row_number()
> | order by: '1' ASC
> | window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
> | mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
> | tuple-ids=0,2 row-size=8B cardinality=15
> | in pipelines: 00(GETNEXT)
> |
> 00:SCAN HDFS [default.t1, RANDOM]
> HDFS partitions=1/1 files=3 size=60B
> stored statistics:
> table: rows=unavailable size=unavailable
> columns: all
> extrapolated-rows=disabled max-scan-range-rows=unavailable
> mem-estimate=32.00MB mem-reservation=8.00KB thread-reservation=1
> tuple-ids=0 row-size=0B cardinality=15
> in pipelines: 00(GETNEXT) {code}
> This is an old issue since we have IMPALA-6323 and IMPALA-8069. IMPALA-6323 allows analytic functions to have a constant order by clause and they are always ignored after IMPALA-8069. This causes analytic funcs being performed locally instead of globally and can cause incorrect results for some functions like row_number().
--
This message was sent by Atlassian Jira
(v8.3.4#803005)