You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "pengdou (Code Review)" <ge...@cloudera.org> on 2021/08/04 06:21:59 UTC
[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
pengdou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17749
Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................
IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
in 50 GB tpc-ds test dataset, the test sql is:
select max(c_customer_sk),
ndv(c_customer_id),
ndv(c_salutation),
ndv(c_first_name),
ndv(c_last_name)
from
(select
c_customer_sk,
c_customer_id,
c_salutation,
c_first_name,
c_last_name
from
customer_parquet
union
all select
c_customer_sk,
c_customer_id,
c_salutation,
c_first_name,
c_last_name
from
customer_kudu
) t
the test result as following:
official solution(fill kudu string with 4 byte pading, only hdfs table can passthrough):
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT 1 1 0.000ns 0.000ns 4.01 MB 4.00 MB
05:AGGREGATE 1 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB FINALIZE
04:EXCHANGE 1 1 0.000ns 0.000ns 3 1 32.00 KB 16.00 KB UNPARTITIONED
F02:EXCHANGE SENDER 3 3 0.000ns 0.000ns 24.00 B 0
03:AGGREGATE 3 3 460.345ms 481.012ms 3 1 1.28 MB 16.00 KB
00:UNION 3 3 261.673ms 282.007ms 24.51M 23.34M 219.00 KB 0
|--02:SCAN KUDU 3 3 835.022ms 885.023ms 12.26M -1 5.04 MB 7.50 MB tpcds_10000_parquet.customer_kudu
01:SCAN HDFS 3 3 133.003ms 139.003ms 12.26M 23.34M 70.34 MB 352.00 MB tpcds_10000_parquet.customer_parquet
my solution(remove string padding in impala kudu scanner, let kudu also can passthrough):
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT 1 1 0.000ns 0.000ns 4.01 MB 4.00 MB
05:AGGREGATE 1 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB FINALIZE
04:EXCHANGE 1 1 0.000ns 0.000ns 3 1 32.00 KB 16.00 KB UNPARTITIONED
F02:EXCHANGE SENDER 3 3 0.000ns 0.000ns 24.00 B 0
03:AGGREGATE 3 3 435.678ms 480.012ms 3 1 1.28 MB 16.00 KB
00:UNION 3 3 2.333ms 5.000ms 24.51M 23.34M 4.00 KB 0
|--02:SCAN KUDU 3 3 1s017ms 1s139ms 12.26M -1 3.75 MB 7.50 MB tpcds_10000_parquet.customer_kudu
01:SCAN HDFS 3 3 130.336ms 159.004ms 12.26M 23.34M 70.30 MB 352.00 MB tpcds_10000_parquet.customer_parquet
Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
5 files changed, 63 insertions(+), 8 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/17749/1
--
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 1
Gerrit-Owner: pengdou <pe...@126.com>
[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )
Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................
Patch Set 1:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/9234/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 1
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 06:46:10 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )
Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................
Patch Set 3:
Build Successful
https://jenkins.impala.io/job/gerrit-code-review-checks/9249/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.
--
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Aug 2021 03:32:45 +0000
Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )
Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................
Patch Set 3:
(3 comments)
Thanks for uploading the patch!
http://gerrit.cloudera.org:8080/#/c/17749/3//COMMIT_MSG
Commit Message:
http://gerrit.cloudera.org:8080/#/c/17749/3//COMMIT_MSG@35
PS3, Line 35: the test result as following:
Could you share the elapsed time of these two queries?
http://gerrit.cloudera.org:8080/#/c/17749/3/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:
http://gerrit.cloudera.org:8080/#/c/17749/3/be/src/exec/kudu-scanner.cc@66
PS3, Line 66: DEFINE_int32(kudu_scanner_string_pading, 4, "string padding in kudu client");
: DEFINE_int32(string_value_length_in_use, 12, "string value length in use");
We can't adjust this on different queries if these are defined as startup flags. I think the ideal solution is detecting whether we can turn on this optimization in the planner. We can be conservative at the beginning that only use this in UNION clause with both hdfs and kudu scans.
http://gerrit.cloudera.org:8080/#/c/17749/3/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:
http://gerrit.cloudera.org:8080/#/c/17749/3/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@335
PS3, Line 335: d.setIsNullable(true);
Can we set this only when the optimization take place? I don't think it will benifit all kinds of queries.
--
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 00:48:09 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )
Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................
Patch Set 1:
(3 comments)
http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:
http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc@476
PS1, Line 476: copy_length = row_size_in_rowbatch - (varlength_string_slot_offsets_[i] + move_forward_offsets + FLAGS_string_value_length_in_use);
line too long (137 > 90)
http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc@478
PS1, Line 478: copy_length = varlength_string_slot_offsets_[i+1] - varlength_string_slot_offsets_[i];
line too long (92 > 90)
http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc@482
PS1, Line 482: kuduTuple + varlength_string_slot_offsets_[i] + FLAGS_string_value_length_in_use + move_forward_offsets,
line too long (110 > 90)
--
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 1
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 06:22:45 +0000
Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
Posted by "pengdou (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins,
I'd like you to reexamine a change. Please visit
http://gerrit.cloudera.org:8080/17749
to look at the new patch set (#3).
Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................
IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
in 50 GB tpc-ds test dataset, the test sql is:
select max(c_customer_sk),
ndv(c_customer_id),
ndv(c_salutation),
ndv(c_first_name),
ndv(c_last_name)
from
(select
c_customer_sk,
c_customer_id,
c_salutation,
c_first_name,
c_last_name
from
customer_parquet
union
all select
c_customer_sk,
c_customer_id,
c_salutation,
c_first_name,
c_last_name
from
customer_kudu
) t
the test result as following:
official solution(fill kudu string with 4 byte pading, only hdfs table can passthrough):
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
----------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT 1 1 0.000ns 0.000ns 4.01 MB 4.00 MB
05:AGGREGATE 1 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB FINALIZE
04:EXCHANGE 1 1 0.000ns 0.000ns 3 1 32.00 KB 16.00 KB UNPARTITIONED
F02:EXCHANGE SENDER 3 3 0.000ns 0.000ns 24.00 B 0
03:AGGREGATE 3 3 460.345ms 481.012ms 3 1 1.28 MB 16.00 KB
00:UNION 3 3 261.673ms 282.007ms 24.51M 23.34M 219.00 KB 0
|--02:SCAN KUDU 3 3 835.022ms 885.023ms 12.26M -1 5.04 MB 7.50 MB tpcds_10000_parquet.customer_kudu
01:SCAN HDFS 3 3 133.003ms 139.003ms 12.26M 23.34M 70.34 MB 352.00 MB tpcds_10000_parquet.customer_parquet
my solution(remove string padding in impala kudu scanner, let kudu also can passthrough):
Operator #Hosts #Inst Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
---------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT 1 1 0.000ns 0.000ns 4.01 MB 4.00 MB
05:AGGREGATE 1 1 0.000ns 0.000ns 1 1 16.00 KB 16.00 KB FINALIZE
04:EXCHANGE 1 1 0.000ns 0.000ns 3 1 32.00 KB 16.00 KB UNPARTITIONED
F02:EXCHANGE SENDER 3 3 0.000ns 0.000ns 24.00 B 0
03:AGGREGATE 3 3 435.678ms 480.012ms 3 1 1.28 MB 16.00 KB
00:UNION 3 3 2.333ms 5.000ms 24.51M 23.34M 4.00 KB 0
|--02:SCAN KUDU 3 3 1s017ms 1s139ms 12.26M -1 3.75 MB 7.50 MB tpcds_10000_parquet.customer_kudu
01:SCAN HDFS 3 3 130.336ms 159.004ms 12.26M 23.34M 70.30 MB 352.00 MB tpcds_10000_parquet.customer_parquet
Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
5 files changed, 68 insertions(+), 8 deletions(-)
git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/17749/3
--
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
Posted by "pengdou (Code Review)" <ge...@cloudera.org>.
pengdou has abandoned this change. ( http://gerrit.cloudera.org:8080/17749 )
Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................
Abandoned
duplicated
--
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>