You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "pengdou (Code Review)" <ge...@cloudera.org> on 2021/09/03 02:38:14 UTC
[Impala-ASF-CR] IMPALA-10785: when union kudu table and hdfs table, union pass through does not take effect
pengdou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17792 )
Change subject: IMPALA-10785: when union kudu table and hdfs table, union pass through does not take effect
......................................................................
IMPALA-10785: when union kudu table and hdfs table, union pass through does not take effect
Kudu slots may not nullable and all string slots pad 4 bytes, thus kudu scan node may not
pass through union node. To resolve the problem, when a kudu scan node union with non-
kudu scan node, we remove the string padding and set none nullable slot to nullable.
when kudu union with kudu, we keep kudu string padding and add padding to
union node's string slots.
The patch allows more Kuro rows to pass through the union node and thus improves query
performance as shown below.
1. Union of two Kudu tables.
SQL:
select max(c_customer_sk), ndv(c_customer_id), ndv(c_salutation), ndv(c_first_name),
ndv(c_last_name) from (select c_customer_sk, c_customer_id, c_salutation,
c_first_name, c_last_name from tpcds_10000_parquet.customer_kudu union all
select c_customer_sk, c_customer_id, c_salutation, c_first_name, c_last_name
from tpcds_10000_parquet.customer_kudu ) t
With the patch:Duration: 2s606ms, all kudu scan node can pass through.
Operator Avg Time Detail
---------------------------------------------------------------------
F03:ROOT 0.000ns
05:AGGREGATE 0.000ns FINALIZE
04:EXCHANGE 0.000ns UNPARTITIONED
F02:EXCHANGE SENDER 0.000ns
03:AGGREGATE 1s170ms
00:UNION 1.000ms
|--02:SCAN KUDU 663.015ms tpcds_10000_parquet.customer_kudu
01:SCAN KUDU 609.014ms tpcds_10000_parquet.customer_kudu
Without the patch: Duration: 3s316ms, all kudu scan node can not pass through
Operator Avg Time Detail
--------------------------------------------------------------------
F03:ROOT 0.000ns
05:AGGREGATE 0.000ns FINALIZE
04:EXCHANGE 0.000ns UNPARTITIONED
F02:EXCHANGE SENDER 0.000ns
03:AGGREGATE 1s150ms
00:UNION 1s254ms
|--02:SCAN KUDU 180.004ms tpcds_10000_parquet.customer_kudu
01:SCAN KUDU 505.012ms tpcds_10000_parquet.customer_kudu
2. Union of a Kudu table and a Parquet table.
SQL:
select max(c_customer_sk), ndv(c_customer_id), ndv(c_salutation), ndv(c_first_name),
ndv(c_last_name) from (select c_customer_sk, c_customer_id, c_salutation,
c_first_name, c_last_name from tpcds_10000_parquet.customer_kudu union all
select c_customer_sk, c_customer_id, c_salutation, c_first_name, c_last_name
from tpcds_10000_parquet.customer_parquet ) t
With the patch: Duration 2s138ms, all scan node can pass through.
Operator Avg Time Detail
------------------------------------------------------------------------
F03:ROOT 0.000ns
05:AGGREGATE 0.000ns FINALIZE
04:EXCHANGE 0.000ns UNPARTITIONED
F02:EXCHANGE SENDER 0.000ns
03:AGGREGATE 1s243ms
00:UNION 1.000ms
|--02:SCAN HDFS 52.001ms tpcds_10000_parquet.customer_parquet
01:SCAN KUDU 623.017ms tpcds_10000_parquet.customer_kudu
Without the patch: Duration 2s473ms, only hdfs scan node can pass through.
Operator Avg Time Detail
-----------------------------------------------------------------------
F03:ROOT 0.000ns
05:AGGREGATE 0.000ns FINALIZE
04:EXCHANGE 0.000ns UNPARTITIONED
F02:EXCHANGE SENDER 1.000ms
03:AGGREGATE 1s165ms
00:UNION 565.015ms
|--01:SCAN KUDU 693.019ms tpcds_10000_parquet.customer_kudu
02:SCAN HDFS 67.001ms tpcds_10000_parquet.customer_parquet
Change-Id: I35cc8ec83ad4a544405198826d98d41d9879d948
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/descriptors.cc
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
13 files changed, 325 insertions(+), 16 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/17792/6
--
To view, visit http://gerrit.cloudera.org:8080/17792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35cc8ec83ad4a544405198826d98d41d9879d948
Gerrit-Change-Number: 17792
Gerrit-PatchSet: 6
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: pengdou <pe...@126.com>