You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "pengdou (Code Review)" <ge...@cloudera.org> on 2021/09/03 02:38:14 UTC

[Impala-ASF-CR] IMPALA-10785: when union kudu table and hdfs table, union pass through does not take effect

pengdou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17792 )

Change subject: IMPALA-10785: when union kudu table and hdfs table, union pass through does not take effect
......................................................................

IMPALA-10785: when union kudu table and hdfs table, union pass through does not take effect

Kudu slots may not nullable and all string slots pad 4 bytes, thus kudu scan node may not
pass through union node. To resolve the problem, when a kudu scan node union with non-
kudu scan node, we remove the string padding and set none nullable slot to nullable.
when kudu union with kudu, we keep kudu string padding and add padding to
union node's string slots.

The patch allows more Kuro rows to pass through the union node and thus improves query
performance as shown below.

1. Union of two Kudu tables.
SQL:
select max(c_customer_sk), ndv(c_customer_id), ndv(c_salutation), ndv(c_first_name),
ndv(c_last_name) from (select c_customer_sk, c_customer_id, c_salutation,
c_first_name, c_last_name from tpcds_10000_parquet.customer_kudu union all
select c_customer_sk, c_customer_id, c_salutation, c_first_name, c_last_name
from tpcds_10000_parquet.customer_kudu ) t

With the patch:Duration: 2s606ms, all kudu scan node can pass through.
Operator                Avg Time   Detail
---------------------------------------------------------------------
F03:ROOT                 0.000ns
05:AGGREGATE             0.000ns   FINALIZE
04:EXCHANGE              0.000ns   UNPARTITIONED
F02:EXCHANGE SENDER      0.000ns
03:AGGREGATE             1s170ms
00:UNION                 1.000ms
|--02:SCAN KUDU        663.015ms   tpcds_10000_parquet.customer_kudu
01:SCAN KUDU           609.014ms   tpcds_10000_parquet.customer_kudu

Without the patch: Duration: 3s316ms, all kudu scan node can not pass through
Operator                Avg Time   Detail
--------------------------------------------------------------------
F03:ROOT                 0.000ns
05:AGGREGATE             0.000ns   FINALIZE
04:EXCHANGE              0.000ns   UNPARTITIONED
F02:EXCHANGE SENDER      0.000ns
03:AGGREGATE             1s150ms
00:UNION                 1s254ms
|--02:SCAN KUDU        180.004ms   tpcds_10000_parquet.customer_kudu
01:SCAN KUDU           505.012ms   tpcds_10000_parquet.customer_kudu

2. Union of a Kudu table and a Parquet table.
SQL:
select max(c_customer_sk), ndv(c_customer_id), ndv(c_salutation), ndv(c_first_name),
ndv(c_last_name) from (select c_customer_sk, c_customer_id, c_salutation,
c_first_name, c_last_name from tpcds_10000_parquet.customer_kudu union all
select c_customer_sk, c_customer_id, c_salutation, c_first_name, c_last_name
from tpcds_10000_parquet.customer_parquet ) t
With the patch: Duration 2s138ms, all scan node can pass through.
Operator               Avg Time    Detail
------------------------------------------------------------------------
F03:ROOT                0.000ns
05:AGGREGATE            0.000ns    FINALIZE
04:EXCHANGE             0.000ns    UNPARTITIONED
F02:EXCHANGE SENDER     0.000ns
03:AGGREGATE            1s243ms
00:UNION                1.000ms
|--02:SCAN HDFS        52.001ms    tpcds_10000_parquet.customer_parquet
01:SCAN KUDU          623.017ms    tpcds_10000_parquet.customer_kudu

Without the patch: Duration 2s473ms, only hdfs scan node can pass through.
Operator               Avg Time   Detail
-----------------------------------------------------------------------
F03:ROOT                0.000ns
05:AGGREGATE            0.000ns   FINALIZE
04:EXCHANGE             0.000ns   UNPARTITIONED
F02:EXCHANGE SENDER     1.000ms
03:AGGREGATE            1s165ms
00:UNION              565.015ms
|--01:SCAN KUDU       693.019ms   tpcds_10000_parquet.customer_kudu
02:SCAN HDFS           67.001ms   tpcds_10000_parquet.customer_parquet

Change-Id: I35cc8ec83ad4a544405198826d98d41d9879d948
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M be/src/runtime/descriptors.cc
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
M fe/src/main/java/org/apache/impala/planner/KuduScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/analytic-fns.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
M testdata/workloads/functional-planner/queries/PlannerTest/union.test
13 files changed, 325 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/92/17792/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17792
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I35cc8ec83ad4a544405198826d98d41d9879d948
Gerrit-Change-Number: 17792
Gerrit-PatchSet: 6
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: pengdou <pe...@126.com>