You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "pengdou (Code Review)" <ge...@cloudera.org> on 2021/08/04 06:21:59 UTC

[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

pengdou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17749


Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................

IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

in 50 GB tpc-ds test dataset, the test sql is:
select max(c_customer_sk),
    ndv(c_customer_id),
        ndv(c_salutation),
        ndv(c_first_name),
        ndv(c_last_name)
from
    (select
        c_customer_sk,
        c_customer_id,
        c_salutation,
        c_first_name,
        c_last_name
    from
        customer_parquet
    union
    all select
        c_customer_sk,
        c_customer_id,
        c_salutation,
        c_first_name,
        c_last_name
    from
        customer_kudu
) t

the test result as following:

official solution(fill kudu string with 4 byte pading, only hdfs table can passthrough):
Operator              #Hosts  #Inst   Avg Time   Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
----------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT                   1      1    0.000ns    0.000ns                        4.01 MB        4.00 MB
05:AGGREGATE               1      1    0.000ns    0.000ns       1           1   16.00 KB       16.00 KB  FINALIZE
04:EXCHANGE                1      1    0.000ns    0.000ns       3           1   32.00 KB       16.00 KB  UNPARTITIONED
F02:EXCHANGE SENDER        3      3    0.000ns    0.000ns                        24.00 B              0
03:AGGREGATE               3      3  460.345ms  481.012ms       3           1    1.28 MB       16.00 KB
00:UNION                   3      3  261.673ms  282.007ms  24.51M      23.34M  219.00 KB              0
|--02:SCAN KUDU            3      3  835.022ms  885.023ms  12.26M          -1    5.04 MB        7.50 MB  tpcds_10000_parquet.customer_kudu
01:SCAN HDFS               3      3  133.003ms  139.003ms  12.26M      23.34M   70.34 MB      352.00 MB  tpcds_10000_parquet.customer_parquet

my solution(remove string padding in impala kudu scanner, let kudu also can passthrough):

Operator              #Hosts  #Inst   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
---------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT                   1      1    0.000ns    0.000ns                       4.01 MB        4.00 MB
05:AGGREGATE               1      1    0.000ns    0.000ns       1           1  16.00 KB       16.00 KB  FINALIZE
04:EXCHANGE                1      1    0.000ns    0.000ns       3           1  32.00 KB       16.00 KB  UNPARTITIONED
F02:EXCHANGE SENDER        3      3    0.000ns    0.000ns                       24.00 B              0
03:AGGREGATE               3      3  435.678ms  480.012ms       3           1   1.28 MB       16.00 KB
00:UNION                   3      3    2.333ms    5.000ms  24.51M      23.34M   4.00 KB              0
|--02:SCAN KUDU            3      3    1s017ms    1s139ms  12.26M          -1   3.75 MB        7.50 MB  tpcds_10000_parquet.customer_kudu
01:SCAN HDFS               3      3  130.336ms  159.004ms  12.26M      23.34M  70.30 MB      352.00 MB  tpcds_10000_parquet.customer_parquet

Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
5 files changed, 63 insertions(+), 8 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/17749/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 1
Gerrit-Owner: pengdou <pe...@126.com>

[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )

Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9234/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 1
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 06:46:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )

Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9249/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 06 Aug 2021 03:32:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )

Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................


Patch Set 3:

(3 comments)

Thanks for uploading the patch!

http://gerrit.cloudera.org:8080/#/c/17749/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17749/3//COMMIT_MSG@35
PS3, Line 35: the test result as following:
Could you share the elapsed time of these two queries?


http://gerrit.cloudera.org:8080/#/c/17749/3/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17749/3/be/src/exec/kudu-scanner.cc@66
PS3, Line 66: DEFINE_int32(kudu_scanner_string_pading, 4, "string padding in kudu client");
            : DEFINE_int32(string_value_length_in_use, 12, "string value length in use");
We can't adjust this on different queries if these are defined as startup flags. I think the ideal solution is detecting whether we can turn on this optimization in the planner. We can be conservative at the beginning that only use this in UNION clause with both hdfs and kudu scans.


http://gerrit.cloudera.org:8080/#/c/17749/3/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
File fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java:

http://gerrit.cloudera.org:8080/#/c/17749/3/fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java@335
PS3, Line 335:           d.setIsNullable(true);
Can we set this only when the optimization take place? I don't think it will benifit all kinds of queries.



-- 
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Comment-Date: Sat, 14 Aug 2021 00:48:09 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17749 )

Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc
File be/src/exec/kudu-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc@476
PS1, Line 476:       copy_length = row_size_in_rowbatch - (varlength_string_slot_offsets_[i] + move_forward_offsets + FLAGS_string_value_length_in_use);
line too long (137 > 90)


http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc@478
PS1, Line 478:       copy_length = varlength_string_slot_offsets_[i+1] - varlength_string_slot_offsets_[i];
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17749/1/be/src/exec/kudu-scanner.cc@482
PS1, Line 482:       kuduTuple + varlength_string_slot_offsets_[i] + FLAGS_string_value_length_in_use + move_forward_offsets,
line too long (110 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 1
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 04 Aug 2021 06:22:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

Posted by "pengdou (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17749

to look at the new patch set (#3).

Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................

IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

in 50 GB tpc-ds test dataset, the test sql is:
select max(c_customer_sk),
    ndv(c_customer_id),
        ndv(c_salutation),
        ndv(c_first_name),
        ndv(c_last_name)
from
    (select
        c_customer_sk,
        c_customer_id,
        c_salutation,
        c_first_name,
        c_last_name
    from
        customer_parquet
    union
    all select
        c_customer_sk,
        c_customer_id,
        c_salutation,
        c_first_name,
        c_last_name
    from
        customer_kudu
) t

the test result as following:

official solution(fill kudu string with 4 byte pading, only hdfs table can passthrough):
Operator              #Hosts  #Inst   Avg Time   Max Time   #Rows  Est. #Rows   Peak Mem  Est. Peak Mem  Detail
----------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT                   1      1    0.000ns    0.000ns                        4.01 MB        4.00 MB
05:AGGREGATE               1      1    0.000ns    0.000ns       1           1   16.00 KB       16.00 KB  FINALIZE
04:EXCHANGE                1      1    0.000ns    0.000ns       3           1   32.00 KB       16.00 KB  UNPARTITIONED
F02:EXCHANGE SENDER        3      3    0.000ns    0.000ns                        24.00 B              0
03:AGGREGATE               3      3  460.345ms  481.012ms       3           1    1.28 MB       16.00 KB
00:UNION                   3      3  261.673ms  282.007ms  24.51M      23.34M  219.00 KB              0
|--02:SCAN KUDU            3      3  835.022ms  885.023ms  12.26M          -1    5.04 MB        7.50 MB  tpcds_10000_parquet.customer_kudu
01:SCAN HDFS               3      3  133.003ms  139.003ms  12.26M      23.34M   70.34 MB      352.00 MB  tpcds_10000_parquet.customer_parquet

my solution(remove string padding in impala kudu scanner, let kudu also can passthrough):

Operator              #Hosts  #Inst   Avg Time   Max Time   #Rows  Est. #Rows  Peak Mem  Est. Peak Mem  Detail
---------------------------------------------------------------------------------------------------------------------------------------------
F03:ROOT                   1      1    0.000ns    0.000ns                       4.01 MB        4.00 MB
05:AGGREGATE               1      1    0.000ns    0.000ns       1           1  16.00 KB       16.00 KB  FINALIZE
04:EXCHANGE                1      1    0.000ns    0.000ns       3           1  32.00 KB       16.00 KB  UNPARTITIONED
F02:EXCHANGE SENDER        3      3    0.000ns    0.000ns                       24.00 B              0
03:AGGREGATE               3      3  435.678ms  480.012ms       3           1   1.28 MB       16.00 KB
00:UNION                   3      3    2.333ms    5.000ms  24.51M      23.34M   4.00 KB              0
|--02:SCAN KUDU            3      3    1s017ms    1s139ms  12.26M          -1   3.75 MB        7.50 MB  tpcds_10000_parquet.customer_kudu
01:SCAN HDFS               3      3  130.336ms  159.004ms  12.26M      23.34M  70.30 MB      352.00 MB  tpcds_10000_parquet.customer_parquet

Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
---
M be/src/exec/kudu-scanner.cc
M be/src/exec/kudu-scanner.h
M fe/src/main/java/org/apache/impala/analysis/SlotDescriptor.java
M fe/src/main/java/org/apache/impala/analysis/TupleDescriptor.java
M fe/src/main/java/org/apache/impala/catalog/Type.java
5 files changed, 68 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/49/17749/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect

Posted by "pengdou (Code Review)" <ge...@cloudera.org>.
pengdou has abandoned this change. ( http://gerrit.cloudera.org:8080/17749 )

Change subject: IMPALA-10785 when union kudu table and hdfs table, union passthrough does not take effect
......................................................................


Abandoned

duplicated
-- 
To view, visit http://gerrit.cloudera.org:8080/17749
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: I0b5696686f6ddeb7a3959c7786d179b9eaf4692d
Gerrit-Change-Number: 17749
Gerrit-PatchSet: 3
Gerrit-Owner: pengdou <pe...@126.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>