You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org> on 2023/05/10 14:14:54 UTC

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Gergely Fürnstáhl has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19850


Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Add fe/thrift changes to the patchset (now its the original version
copied from PHJ)
 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/hash-table.h
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
27 files changed, 2,140 insertions(+), 114 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12987/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 May 2023 14:36:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 7:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12997/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 7
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 May 2023 13:49:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 22:

(12 comments)

Went over the delta since my last review. Tomorrow I'll do another full round.

http://gerrit.cloudera.org:8080/#/c/19850/22//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/22//COMMIT_MSG@14
PS22, Line 14: by
             : this node.
What do you mean by 'by this node'?


http://gerrit.cloudera.org:8080/#/c/19850/22//COMMIT_MSG@21
PS22, Line 21: paralel
parallel


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h@98
PS22, Line 98: them and
             : ///   stores only
formatting is weird. And at L102 as well.


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h@104
PS22, Line 104: filtering
of the delete files.


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h@104
PS22, Line 104: data skew
due to the imbalance in the number of deleted rows corresponding to different data files.


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h@105
PS22, Line 105: than using a more
              : ///                even hashing
than hashing based on both file_path and position.

Though I'm not sure if we want to talk about this here. Probably explaining that rows are hashed based on file paths so each file is processed by an assigned executor is enough.


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@171
PS22, Line 171: _
local vars don't need to end with '_'


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@180
PS22, Line 180:       DCHECK(hdfs_table_->IsIcebergTable());
DCHECK can be moved out from the for-loop.


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@202
PS22, Line 202: == true
No need for the equality check


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc@367
PS22, Line 367: &&
If we moved this to the prev line we could align the predicates nicer. Same goes for the previous &&


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc@367
PS22, Line 367:              && current_probe_pos_
              :                  > (*current_delete_row_)[current_deleted_pos_row_id_]
What if the file didn't change and current_probe_pos_ <= (*current_delete_row_)[current_deleted_pos_row_id_]) ? This is currently falls to the slow path.


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc@489
PS22, Line 489: row ids are not guaranteed between row batches
Can we add a DCHECK somewhere so we can guarantee that they are in ascending order inside row batches? In BROADCAST mode we'll even see only one file per row batch.

In that case we could even have a high-speed path, e.g.:

 if (first position in rowbatch > last position in delete vector ||
     last position in rowbatch < first position in delete vector) {
   just pass through the row batch
}

No need to implement that in this PS, we can just note it for further improvements.



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 22
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jun 2023 17:28:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Peter Rozsa (Code Review)" <ge...@cloudera.org>.
Peter Rozsa has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 22:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h@128
PS22, Line 128:   virtual Status Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) override;
'virtual' keywords are redundant


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@173
PS22, Line 173:     DCHECK(tuple_desc_->table_desc() != NULL);
Could be nullptr


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@193
PS22, Line 193:           reinterpret_cast<char*>(expr_results_pool_->Allocate(file_path_str.length()));
Allocate() could result nullptr


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@311
PS22, Line 311:         char* ptr_copy = reinterpret_cast<char*>(expr_results_pool_->Allocate(length));
Allocate() could result nullptr


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.h@42
PS22, Line 42:   virtual Status Init(const TPlanNode& tnode, FragmentState* state) override;
'virtual' keywords are redundant


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc@426
PS22, Line 426:   DCHECK(current_probe_row_ != NULL);
Could be nullptr


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc@454
PS22, Line 454:     if (current_probe_row_ != NULL) {
Could be nullptr


http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@716
PS22, Line 716:     Analyzer analyzer = ctx_.getRootAnalyzer();
Unused


http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@121
PS22, Line 121:       BinaryPredicate bp = (BinaryPredicate) entry;
Unnecessary cast, entry is already BinaryPredicate



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 22
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jun 2023 12:16:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 13:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/19850/13/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/13/tests/query_test/test_iceberg.py@1116
PS13, Line 1116: e
flake8: E501 line too long (102 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19850/13/tests/query_test/test_iceberg.py@1117
PS13, Line 1117: r
flake8: E501 line too long (139 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 13
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Jun 2023 14:16:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 25:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13467/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 25
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 01 Jul 2023 08:24:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 10:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@260
PS10, Line 260: Status IcebergDeleteBuilder::FinalizeBuild(RuntimeState* state) {
> Maybe we could also have a separate timer for FinalizeBuild() in the profil
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@307
PS10, Line 307:     std::string filename =
              :         std::string(reinterpret_cast<const char*>(strval.ptr), strval.len);
> If we had StringVal as key in 'data_files_' and 'delete_hash' we wouldn't n
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@307
PS10, Line 307:     std::string filename =
              :         std::string(reinterpret_cast<const char*>(strval.ptr), strval.len)
> What happens when file_path or position is NULL? It shouldn't be, according
ptr=0, len=0, hash_value will be 0, so we don't crash, produce bad result, added check for it


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@621
PS10, Line 621:     if (lhsHasCompatPartition && rhsHasCompatPartition
              :         && isCompatPartition(leftChildFragment.getDataPartition(),
              :                rightChildFragment.getDataPartition(), lhsJoinExprs, rhsJoinExprs,
              :                analyzer))
> This should be always true for this operator, right? Either we can remove t
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@645
PS10, Line 645: lhsHasCompatPartition
> Always true?
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@659
PS10, Line 659: rhsHasCompatPartition
> Always true?
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@694
PS10, Line 694:     switch (node.getJoinOp()) {
> Do we need this switch-case?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 14:39:14 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 29:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13478/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 29
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 09:39:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 24:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.h@212
PS24, Line 212:     void Init(IcebergDeleteBuilder* builder);
              :     void Update(impala::StringValue* file_path, int64_t* probe_pos);
              :     bool IsDeleted() const;
              :     void Delete();
              :     bool NeedCheck() const;
              :     void Clear();
              :     void Reset();
> Please add short comments about the methods.
Done


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.h@232
PS24, Line 232: we can pass through the rest
> We can also pass through if the file-position of the last row in the probe 
Good idea, but only in Broadcast mode, I don't think we can assume in distributed mode that even if the file is the same at the beginning and at the end, there isnt a middle block with different file


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@366
PS24, Line 366:   DCHECK(builder_->IsDistributedMode()
              :       || (current_probe_pos_ == INVALID_ROW_ID || current_probe_pos_ < *probe_pos));
> Can we add a similar DCHECK for the PARTITION MODE:
Good idea, but needs a bit different boolean logic, I will do some testing with it


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@413
PS24, Line 413: }
> nit: missing empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@473
PS24, Line 473: current_probe_row_ != nullptr
> Can it be null if we are not at end?
Yes, at the beginning of a batch. This input/output batch logic coming from PHJNode, but that does more with prefetching, and I agree is a but scuffed. Its on my list to revisit it, especially now to spare a NeedCheck call.


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@475
PS24, Line 475: current_probe_row_ != nullptr
> Can it be null?
Same as above


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@506
PS24, Line 506: probe_batch_->num_tuples_per_row()
> I think it's always 1 because there's always a SCAN node at the probe side.
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/be/src/exec/iceberg-delete-node.inline.h
File be/src/exec/iceberg-delete-node.inline.h:

http://gerrit.cloudera.org:8080/#/c/19850/23/be/src/exec/iceberg-delete-node.inline.h@26
PS23, Line 26: 
             : 
             : 
             : 
             : 
> If this is the only inline method then maybe we can just move it to iceberg
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@59
PS23, Line 59:   public boolean isBlockingJoinNode() { return true; }
             : 
             :   @
> nit: fits single line
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@63
PS23, Line 63: 
             :   @Override
             :   public void init(Analyzer analyzer) throws ImpalaException {
             :    
> nit: fits single line
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@78
PS23, Line 78:       newEqJoinConjuncts.add(newEqPred);
> Is this resolved?
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@80
PS23, Line 80: Conjuncts_ = newEqJoinConjuncts;
> t0.matchesType(t1)
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@318
PS23, Line 318: filePathEq =
> nit: in Java code we are using camelCase.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 24
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jun 2023 22:01:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 6:

(25 comments)

http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-builder.h@60
PS6, Line 60:       int64_t spillable_buffer_size, int64_t max_row_buffer_size, RuntimeState* state) const;
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-builder.h@126
PS6, Line 126:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-builder.h@171
PS6, Line 171:   /// immediately succeeds. Returns false and sets 'status' if it was unable to append the row, even after spilling
line too long (115 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-builder.cc@161
PS6, Line 161: void IcebergDeleteBuilder::calculateDataFiles(const IcebergDeleteBuilderConfig& sink_config,
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h@62
PS6, Line 62: /// Operator to perform iceberg delete. 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h@93
PS6, Line 93: /// IcebergDeleteState of the builder will drive the iceberg delete algorithm across all the
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h@119
PS6, Line 119:   // This enum drives the state machine in GetNext() that processes probe batches and generates
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h@124
PS6, Line 124:   // producing a variable number of output rows. When the processing is done EOS is entered,
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h@126
PS6, Line 126:   // start                     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h@134
PS6, Line 134:   //     + PROBING_END_BATCH +------------->+       EOS      | 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.h@161
PS6, Line 161:   bool inline ProcessProbeRow(RowBatch::Iterator* out_batch_iterator, int* remaining_capacity,
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.cc@415
PS6, Line 415:       auto it = builder_->delete_hash.find(std::string(reinterpret_cast<const char*>(strval.ptr), strval.len));
line too long (111 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/exec/iceberg-delete-node.cc@438
PS6, Line 438:           current_delete_id_it_ = std::lower_bound(it->second.begin(),it->second.end(),current_probe_id_.val);
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/service/query-options.cc
File be/src/service/query-options.cc:

http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/service/query-options.cc@1092
PS6, Line 1092:         RETURN_IF_ERROR(GetThriftEnum(value, "optimized iceberg v2 read distribution mode override",
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/be/src/service/query-options.cc@1094
PS6, Line 1094:         query_options->__set_optimized_iceberg_v2_read_distribution_mode_override(enum_type);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/6/common/thrift/Query.thrift@634
PS6, Line 634:   159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/6/common/thrift/Query.thrift@634
PS6, Line 634:   159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override;  
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
File fe/src/main/java/org/apache/impala/analysis/JoinOperator.java:

http://gerrit.cloudera.org:8080/#/c/19850/6/fe/src/main/java/org/apache/impala/analysis/JoinOperator.java@70
PS6, Line 70:         this == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN || this == JoinOperator.ICEBERG_DELETE_JOIN;
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/fe/src/main/java/org/apache/impala/analysis/JoinOperator.java@91
PS6, Line 91:         this == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN || this == JoinOperator.ICEBERG_DELETE_JOIN;
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@806
PS6, Line 806:     DistributionMode distrMode = computeJoinDistributionMode(node, broadcastCost, partitionCost, rhsDataSize);
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@809
PS6, Line 809:     if (ctx_.getQueryOptions().optimized_iceberg_v2_read_distribution_mode_override != null)
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@814
PS6, Line 814:       } else if (ctx_.getQueryOptions().optimized_iceberg_v2_read_distribution_mode_override ==
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/6/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/6/tests/query_test/test_iceberg.py@1115
PS6, Line 1115: 1
flake8: E501 line too long (93 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19850/6/tests/query_test/test_iceberg.py@1115
PS6, Line 1115:  
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/6/tests/query_test/test_iceberg.py@1115
PS6, Line 1115:     cls.ImpalaTestMatrix.add_dimension(ImpalaTestDimension('optimized_v2_partitioned', 0, 1)) 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 6
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 May 2023 13:25:49 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 6:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12996/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 6
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 May 2023 13:45:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:
 - iceberg delete node specific compute costs and broadcast/distribution
mode deciding logic

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,899 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/20
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 20
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 10:

(34 comments)

Thanks, went through the backend, build/probe logic refactor and frontend is still TODO

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/hash-table.h@442
PS10, Line 442:   friend class IcebergDeleteBuilder;
              :   friend class IcebergDeleteNode;
> Are these lines needed?
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@18
PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_BUILDER_H
             : #define IMPALA_EXEC_ICEBERG_DELETE_BUILDER_H
> nit: use #pragma once
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@57
PS10, Line 57: a
> nit: an
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@58
PS10, Line 58:   /// IcebergDeleteNode.
> nit: fits previous line
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@63
PS10, Line 63: a
> nit: an
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@110
PS10, Line 110: each probe thread working on the same set of
              : /// partitions
> Obsolete comment, as this operator doesn't partition its inputs.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@120
PS10, Line 120: long long
> nit: we use int64_t instead of long long in the Impala code
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@120
PS10, Line 120: h;
> nit: missing '_' at and of member name.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@157
PS10, Line 157: hash_partitions_
> There is no 'hash_partitions_'
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@160
PS10, Line 160: that that
> nit: multiple 'that'
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@164
PS10, Line 164: Also used by
> nit: end of sentence is missing
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@58
PS10, Line 58: using strings::Substitute
> nit: common/names.h already has this using.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@95
PS10, Line 95: eq_join_conjuncts
> We could add DCHECK that the size of it is 2.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@127
PS10, Line 127: Hash
> IcebergPositionDelete?
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@150
PS10, Line 150: Hash
> IcebergPositionDelete?
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@206
PS10, Line 206: boost
> Can we use std::filesystem?
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@18
PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_NODE_H
             : #define IMPALA_EXEC_ICEBERG_DELETE_NODE_H
> nit: use #pragma once
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@37
PS10, Line 37: class BloomFilter;
> unused
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@39
PS10, Line 39: class IcebergDeleteNode;
> not needed here
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@78
PS10, Line 78: t
> nit: to early line break here
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@79
PS10, Line 79: build partitions
> This solution doesn't use build partitions AFAICT.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@81
PS10, Line 81: phase
> nit: 'build phase', or 'this phase' instead of 'the phase'
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@89
PS10, Line 89: ///
             : ///
             : ///
> nit: too many empty lines
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@104
PS10, Line 104: virtual
> nit: 'virtual' keyword is redundant here
that is the more common style in the code base, I would keep it that way


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@186
PS10, Line 186: Get the next row batch from the probe (left) side (child(0))
> This operator always gets the next row bratch from the probe side, right?
Yes


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@210
PS10, Line 210:   /// Additional state for flushing build-side data. Needed for
              :   /// separate build.
              :   /// TODO: IMPALA-9411: this could be removed if we attached the buffers.
              :   bool flushed_unattachable_build_buffers_ = false;
> Not relevant for this operator?
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@52
PS10, Line 52: using strings::Substitute
> Unnecessary because of common/names.h
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@142
PS10, Line 142: 
> nit: unnecessary empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@345
PS10, Line 345: (
> nit: I see no closing parenthesis.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@359
PS10, Line 359:   ;
> nit: unnecessary ;
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@392
PS10, Line 392: {
              :       }
> empty block
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.inline.h
File be/src/exec/iceberg-delete-node.inline.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.inline.h@18
PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_NODE_INLINE_H
             : #define IMPALA_EXEC_ICEBERG_DELETE_NODE_INLINE_H
> nit: use #pragma once
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift@795
PS10, Line 795: of
> nit: off
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift@638
PS10, Line 638: 159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override
> This is good for perf testing for now, but we should limit the number of qu
yes, that is the plan



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 10:29:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#30). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/00000-0-data-gfurnstahl_20230705104815_d51294d1-5658-497c-94f2-54269eb011be-job_16881608248131_0342-1-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/delete-74487c6141f8028e-98cbf79b00000000_453028296_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/01dc3e0b-fe92-4d60-973b-fcbb58f71be5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/e6680781-452a-41d3-a149-0136fa868069-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-4821756033809199889-1-e6680781-452a-41d3-a149-0136fa868069.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-7963820769190930835-1-01dc3e0b-fe92-4d60-973b-fcbb58f71be5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
49 files changed, 3,712 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/30
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 30
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 31:

(8 comments)

Thanks for the update Gergo, marked some nits. Doing another round soon.

http://gerrit.cloudera.org:8080/#/c/19850/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/30//COMMIT_MSG@24
PS30, Line 24: 
nit: could you add a testing part as well and describe the test that were done in short bullet points?


http://gerrit.cloudera.org:8080/#/c/19850/30/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/30/be/src/exec/iceberg-delete-builder.h@104
PS30, Line 104: ///  data files.
nit: indentation


http://gerrit.cloudera.org:8080/#/c/19850/30/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/30/be/src/exec/iceberg-delete-node.cc@61
PS30, Line 61:   // TODO: IMPALA-12265
nit: could you add a sentence or two that describes this Jira?
Like in L165.


http://gerrit.cloudera.org:8080/#/c/19850/30/be/src/exec/iceberg-delete-node.cc@316
PS30, Line 316: \n
nit: std::endl


http://gerrit.cloudera.org:8080/#/c/19850/30/be/src/exec/partitioned-hash-join-node.cc
File be/src/exec/partitioned-hash-join-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/30/be/src/exec/partitioned-hash-join-node.cc@86
PS30, Line 86: IMPALA-12265
nit: could you add a sentence that describes this Jira?


http://gerrit.cloudera.org:8080/#/c/19850/30/fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
File fe/src/main/java/org/apache/impala/analysis/JoinOperator.java:

http://gerrit.cloudera.org:8080/#/c/19850/30/fe/src/main/java/org/apache/impala/analysis/JoinOperator.java@108
PS30, Line 108: throw new IllegalStateException("Not implemented");
nit: this is not needed


http://gerrit.cloudera.org:8080/#/c/19850/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/30/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@23
PS30, Line 23: *
nit: we shouldn't import the whole package.


http://gerrit.cloudera.org:8080/#/c/19850/30/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/30/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@254
PS30, Line 254: TODO
nit: missing Jira id



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 31
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 12:10:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 10:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG@13
PS10, Line 13: intergrate
> nit: integrate
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@305
PS10, Line 305:     BigIntVal id = build_expr_evals_[0]->GetBigIntVal(build_row);
              :     StringVal strval = build_expr_evals_[1]->GetStringVal(build_row);
> This is on a critical code path, and we can write very specific code here. 
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@307
PS10, Line 307:     std::string filename =
              :         std::string(reinterpret_cast<const char*>(strval.ptr), strval.len);
> If we had StringVal as key in 'data_files_' and 'delete_hash' we wouldn't n
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@311
PS10, Line 311: delete_hash[std::move(filename)]
> Instead of creating a new vector with 0 capacity at the first insertion, pr
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@223
PS10, Line 223:   bool hashed_ = false;
              :   bool matched_ = false;
> Please add comments for the members
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225
PS10, Line 225: long long
> nit: int64_t
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225
PS10, Line 225:   std::vector<long long>* current_delete_id_vec_;
              :   std::vector<long long>::iterator current_delete_id_it_;
              :   impala_udf::BigIntVal current_probe_id_;
              :   uint8_t* current_filename_ = nullptr;
              :   int current_file_size_;
> Could be moved to a struct.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@230
PS10, Line 230: size_t
> nit: int32_t
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@231
PS10, Line 231: size_t
> int32_t
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@234
PS10, Line 234:   std::unordered_set<std::string> files_in_batch_;
              :   std::unordered_set<uint8_t*> str_ptr_in_batch_;
              :   std::unordered_map<std::string, std::vector<long long>>::iterator last_it_;
> Could be moved to a struct.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@402
PS10, Line 402:       current_probe_id_ = probe_expr_evals_[0]->GetBigIntVal(current_probe_row_);
              :       StringVal strval = probe_expr_evals_[1]->GetStringVal(current_probe_row_);
> Instead of getting the values via virtual functions, we could just retrieve
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@408
PS10, Line 408:       if (it == builder_->delete_hash.end()) {
              :         matched_ = false;
> Currently we are only scanning data files that have corresponding delete fi
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@416
PS10, Line 416: 
              : 
              :         if (!hashed_)
> nit: else
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@48
PS10, Line 48:  * Hash join between left child (outer) and right child (inner). One child must be the
             :  * plan generated for a table ref. Typically, that is the right child, but due to join
             :  * inversion (for outer/semi/cross joins) it could also be the left child.
> Copy-pasted comment, irrelevant to this operator.
Done


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@83
PS10, Line 83:       if (!t0.matchesType(t1)) {
             :         // With decimal and char types, the child types do not have to match because
             :         // the equality builtin handles it. However, they will not hash correctly so
             :         // insert a cast.
             :         boolean bothDecimal = t0.isDecimal() && t1.isDecimal();
             :         boolean bothString = t0.isStringType() && t1.isStringType();
             :         if (!bothDecimal && !bothString) {
             :           throw new InternalException("Cannot compare " + t0.toSql() + " to " + t1.toSql()
             :               + " in join predicate.");
             :         }
             :         Type compatibleType =
             :             Type.getAssignmentCompatibleType(t0, t1, false, analyzer.isDecimalV2());
             :         if (compatibleType.isInvalid()) {
             :           throw new InternalException(
             :               String.format("Unable create a hash join with equi-join predicate %s "
             :                       + "because the operands cannot be cast without loss of precision. "
             :                       + "Operand types: %s = %s.",
             :                   eqPred.toSql(), t0.toSql(), t1.toSql()));
             :         }
             :         Preconditions.checkState(
             :             compatibleType.isDecimal() || compatibleType.isStringType());
             :         try {
             :           if (!t0.equals(compatibleType)) {
             :             eqPred.setChild(0, eqPred.getChild(0).castTo(compatibleType));
             :           }
             :           if (!t1.equals(compatibleType)) {
             :             eqPred.setChild(1, eqPred.getChild(1).castTo(compatibleType));
             :           }
             :         } catch (AnalysisException e) {
             :           throw new InternalException("Should not happen", e);
             :         }
             :       }
> Unneeded. I think we just need a couple of preconditions that we got the ex
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 12:07:02 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,967 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/11
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 11
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#27). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Still TODO:
 - revisit resource profile/cost calculations
 - multi split testing

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
35 files changed, 3,138 insertions(+), 149 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/27
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 27
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 13:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13281/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 13
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Jun 2023 14:38:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Double check java reviews
 - NULL in delete file
 - add FinalizeBuild timer

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,970 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/16
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 16
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13299/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 16
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 12:29:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 22:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.h@128
PS22, Line 128:   virtual Status Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) override;
> 'virtual' keywords are redundant
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@173
PS22, Line 173:     DCHECK(tuple_desc_->table_desc() != NULL);
> Could be nullptr
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@193
PS22, Line 193:           reinterpret_cast<char*>(expr_results_pool_->Allocate(file_path_str.length()));
> Allocate() could result nullptr
Good catch, fixed, although most of the time we handle this sloppily as I see.


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-builder.cc@311
PS22, Line 311:         char* ptr_copy = reinterpret_cast<char*>(expr_results_pool_->Allocate(length));
> Allocate() could result nullptr
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.h@42
PS22, Line 42:   virtual Status Init(const TPlanNode& tnode, FragmentState* state) override;
> 'virtual' keywords are redundant
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc@426
PS22, Line 426:   DCHECK(current_probe_row_ != NULL);
> Could be nullptr
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/be/src/exec/iceberg-delete-node.cc@454
PS22, Line 454:     if (current_probe_row_ != NULL) {
> Could be nullptr
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@716
PS22, Line 716:     Analyzer analyzer = ctx_.getRootAnalyzer();
> Unused
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/22/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@121
PS22, Line 121:       BinaryPredicate bp = (BinaryPredicate) entry;
> Unnecessary cast, entry is already BinaryPredicate
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 22
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jun 2023 14:18:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 8:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12998/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 8
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 May 2023 13:41:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
this node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector paralel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Still TODO:
 - iceberg delete node specific compute costs and broadcast/distribution
mode deciding logic
 - multi split testing

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,936 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/21
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 21
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,959 insertions(+), 102 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/13
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 13
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 24:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@366
PS24, Line 366:   DCHECK(builder_->IsDistributedMode()
              :       || (current_probe_pos_ == INVALID_ROW_ID || current_probe_pos_ < *probe_pos));
> Good idea, but needs a bit different boolean logic, I will do some testing 
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 24
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 09:13:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 34: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 34
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 20:32:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 28:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13474/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 28
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 10:32:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13300/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 17
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 15 Jun 2023 15:02:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:
 - iceberg delete node specific compute costs and broadcast/distribution
mode deciding logic

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,899 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/19
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 19
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13416/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 21
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jun 2023 13:42:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#24). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Still TODO:
 - iceberg delete node specific compute costs and broadcast/distribution
mode deciding logic
 - multi split testing

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
32 files changed, 3,117 insertions(+), 108 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/24
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 24
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 28:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19850/28/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/28/common/thrift/Query.thrift@646
PS28, Line 646:   
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 28
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 10:13:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#31). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/00000-0-data-gfurnstahl_20230705104815_d51294d1-5658-497c-94f2-54269eb011be-job_16881608248131_0342-1-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/delete-74487c6141f8028e-98cbf79b00000000_453028296_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/01dc3e0b-fe92-4d60-973b-fcbb58f71be5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/e6680781-452a-41d3-a149-0136fa868069-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-4821756033809199889-1-e6680781-452a-41d3-a149-0136fa868069.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-7963820769190930835-1-01dc3e0b-fe92-4d60-973b-fcbb58f71be5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
49 files changed, 3,711 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/31
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 31
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 34: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 34
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 16:14:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:
 - iceberg delete node specific compute costs and broadcast/distribution
mode deciding logic

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,899 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/17
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 17
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/hash-table.h
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
30 files changed, 2,067 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 6
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/hash-table.h
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
30 files changed, 2,052 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/9
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 9
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,962 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/15
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 15
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13357/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 20
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 21 Jun 2023 13:06:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 27:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13473/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 27
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 10:10:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 22:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19850/23/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/23/be/src/exec/iceberg-delete-node.cc@63
PS23, Line 63:   // TODO: simplify this by ensuring that UseSeparateBuild() is accurate in Init().
> nit: could you create a Jira for this?
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@722
PS23, Line 722: 
> Our operator is not a null-aware anti join, as neither side can be NULL. We
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@752
PS23, Line 752:    + " row
> file path
Done


http://gerrit.cloudera.org:8080/#/c/19850/23/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@778
PS23, Line 778: ition: cost=" + Long.toString(partitionCost));
              :       LOG.trace("lhs card=" + Long.toString(lhsTree.getCardinality())
> We only need to compute this if the user haven't specified it via the query
Done


http://gerrit.cloudera.org:8080/#/c/19850/22/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/22/tests/query_test/test_iceberg.py@1121
PS22, Line 1121: #
> flake8: E265 block comment should start with '# '
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 22
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 12:29:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 31:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13480/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 31
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 12:00:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#33). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Testing:
  - Duplicated related planner tests to run both with new operator and
hash join
  - Added a dimension for e2e tests to run both with new operator and
hash join
  - Added new multiblock tests to verify assumptions used in new
operator to optimize probing
  - Added new test with BATCH_SIZE=2 to verify in/out batch handling
with new operator

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/00000-0-data-gfurnstahl_20230705104815_d51294d1-5658-497c-94f2-54269eb011be-job_16881608248131_0342-1-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/delete-74487c6141f8028e-98cbf79b00000000_453028296_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/01dc3e0b-fe92-4d60-973b-fcbb58f71be5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/e6680781-452a-41d3-a149-0136fa868069-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-4821756033809199889-1-e6680781-452a-41d3-a149-0136fa868069.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-7963820769190930835-1-01dc3e0b-fe92-4d60-973b-fcbb58f71be5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
49 files changed, 3,555 insertions(+), 156 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/33
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 33
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13337/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 19
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Jun 2023 16:05:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,967 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/12
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 12
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/hash-table.h
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
30 files changed, 2,052 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/10
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 10:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13044/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 11:17:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#25). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Still TODO:
 - iceberg delete node specific compute costs and broadcast/distribution
mode deciding logic
 - multi split testing

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
34 files changed, 3,180 insertions(+), 141 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/25
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 25
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 24:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13460/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 24
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jun 2023 22:16:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 33: Code-Review+2

This is a huge change :) Thanks for the work, Gergo!


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 33
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 16:12:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 33:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13482/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 33
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 16:18:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 31:

(15 comments)

I just scratched the surface of the patch in overall. I relied on Zoli already given +1 on PS24 so apart from a quick run-through I checked the diffs after that.

http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h@16
PS31, Line 16: // under the License.
I started checking the includes and forward declarations in this file and apparently a lot of them aren't used here. Could you please re-visit these and get rid of the ones you don't use in this header?


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h@20
PS31, Line 20: #include <deque>
This is not used.


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h@42
PS31, Line 42: class CyclicBarrier;
This class is not used in the header.


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h@46
PS31, Line 46: class ScalarExpr;
This class is not used in the header.


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h@47
PS31, Line 47: class ScalarExprEvaluator;
This class is not used in the header.


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h@134
PS31, Line 134:   // The following functions are used only by IcebergDeleteNode.
I personally don't really like comments like this. Currently I get that these are only used by IcebergDeleteNode, but it's pretty easy to write code in the future that would make this comment obsolete.


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.h@137
PS31, Line 137:   
typo


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-builder.cc@21
PS31, Line 21: #include <numeric>
Is this used?

Similarly to the .h file, could you please check these includes if they are used anymore?


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-node.h@21
PS31, Line 21: #include <list>
Apparently, from this import block you don't really use any of these in the header. On the other hand, you use std::vector that is not included.


http://gerrit.cloudera.org:8080/#/c/19850/31/be/src/exec/iceberg-delete-node.h@28
PS31, Line 28: #include "exec/exec-node.h"
If I'm not mistaken, the only use of this header is that you refer to ExecNode** in L44. Can't that be a forward declaration and get rid of this include?


http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@32
PS31, Line 32: import org.apache.impala.thrift.TJoinDistributionMode;
not used


http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@30
PS31, Line 30: import org.apache.impala.common.AnalysisException;
not used


http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@32
PS31, Line 32: import org.apache.impala.common.InternalException;
not used


http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@42
PS31, Line 42: import com.google.common.base.Joiner;
I don't think this is used either.


http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

http://gerrit.cloudera.org:8080/#/c/19850/31/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java@58
PS31, Line 58: JOIN and operator
This part of the sentence isn't meaningful I think. extra 'and'?



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 31
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 13:38:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 12:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13101/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 12
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 10:50:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 12:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19850/12/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/12/be/src/exec/iceberg-delete-builder.h@171
PS12, Line 171:   std::vector<ScalarExprEvaluator*> build_expr_evals_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/12/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/12/be/src/exec/iceberg-delete-builder.cc@134
PS12, Line 134:         -1, sink_config, ConstructBuilderName("IcebergDelete", sink_config.join_node_id_), state),
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/12/be/src/exec/join-op.h
File be/src/exec/join-op.h:

http://gerrit.cloudera.org:8080/#/c/19850/12/be/src/exec/join-op.h@29
PS12, Line 29:       || join_op == TJoinOp::NULL_AWARE_LEFT_ANTI_JOIN || join_op == TJoinOp::ICEBERG_DELETE_JOIN;
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/12/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/12/tests/query_test/test_iceberg.py@1116
PS12, Line 1116: e
flake8: E501 line too long (102 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19850/12/tests/query_test/test_iceberg.py@1117
PS12, Line 1117: r
flake8: E501 line too long (139 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 12
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 10:31:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19850/11/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/11/be/src/exec/iceberg-delete-builder.h@171
PS11, Line 171:   std::vector<ScalarExprEvaluator*> build_expr_evals_;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/11/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/11/be/src/exec/iceberg-delete-builder.cc@134
PS11, Line 134:         -1, sink_config, ConstructBuilderName("IcebergDelete", sink_config.join_node_id_), state),
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/11/be/src/exec/join-op.h
File be/src/exec/join-op.h:

http://gerrit.cloudera.org:8080/#/c/19850/11/be/src/exec/join-op.h@29
PS11, Line 29:       || join_op == TJoinOp::NULL_AWARE_LEFT_ANTI_JOIN || join_op == TJoinOp::ICEBERG_DELETE_JOIN;
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/11/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/11/tests/query_test/test_iceberg.py@1115
PS11, Line 1115: e
flake8: E501 line too long (102 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19850/11/tests/query_test/test_iceberg.py@1116
PS11, Line 1116: r
flake8: E501 line too long (139 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 11
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 10:30:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 10:

(47 comments)

I looked mostly at the backend.

http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/10//COMMIT_MSG@13
PS10, Line 13: intergrate
nit: integrate


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/hash-table.h
File be/src/exec/hash-table.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/hash-table.h@442
PS10, Line 442:   friend class IcebergDeleteBuilder;
              :   friend class IcebergDeleteNode;
Are these lines needed?


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@18
PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_BUILDER_H
             : #define IMPALA_EXEC_ICEBERG_DELETE_BUILDER_H
nit: use #pragma once


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@57
PS10, Line 57: a
nit: an


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@58
PS10, Line 58:   /// IcebergDeleteNode.
nit: fits previous line


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@63
PS10, Line 63: a
nit: an


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@110
PS10, Line 110: each probe thread working on the same set of
              : /// partitions
Obsolete comment, as this operator doesn't partition its inputs.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@120
PS10, Line 120: long long
nit: we use int64_t instead of long long in the Impala code


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@120
PS10, Line 120: h;
nit: missing '_' at and of member name.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@157
PS10, Line 157: hash_partitions_
There is no 'hash_partitions_'


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@160
PS10, Line 160: that that
nit: multiple 'that'


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.h@164
PS10, Line 164: Also used by
nit: end of sentence is missing


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@58
PS10, Line 58: using strings::Substitute
nit: common/names.h already has this using.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@95
PS10, Line 95: eq_join_conjuncts
We could add DCHECK that the size of it is 2.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@127
PS10, Line 127: Hash
IcebergPositionDelete?


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@150
PS10, Line 150: Hash
IcebergPositionDelete?


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@206
PS10, Line 206: boost
Can we use std::filesystem?


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@260
PS10, Line 260: Status IcebergDeleteBuilder::FinalizeBuild(RuntimeState* state) {
Maybe we could also have a separate timer for FinalizeBuild() in the profile.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@305
PS10, Line 305:     BigIntVal id = build_expr_evals_[0]->GetBigIntVal(build_row);
              :     StringVal strval = build_expr_evals_[1]->GetStringVal(build_row);
This is on a critical code path, and we can write very specific code here. Since we know the rows always hold a file_path and position, we can just get the relevant slots and cast them.

See sample code in https://github.com/apache/impala/blob/master/be/src/exprs/slot-ref.cc


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@307
PS10, Line 307:     std::string filename =
              :         std::string(reinterpret_cast<const char*>(strval.ptr), strval.len);
If we had StringVal as key in 'data_files_' and 'delete_hash' we wouldn't need to dynamically allocate memory and copy bytes here.

So we would just allocate memory from a MemPool for the filenames, create StringVal objects for them, and use them in both 'data_files_' and 'delete_hash'. A custom hash() function is needed for StringVal, but it should be fairly trivial to implement. This would also track the memory consumed by the file paths at least.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@307
PS10, Line 307:     std::string filename =
              :         std::string(reinterpret_cast<const char*>(strval.ptr), strval.len)
What happens when file_path or position is NULL? It shouldn't be, according to the Iceberg spec, but still, Impala should not crash on invalid input.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-builder.cc@311
PS10, Line 311: delete_hash[std::move(filename)]
Instead of creating a new vector with 0 capacity at the first insertion, probably we could reserve a capacity of 8 (one cache line).


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@18
PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_NODE_H
             : #define IMPALA_EXEC_ICEBERG_DELETE_NODE_H
nit: use #pragma once


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@37
PS10, Line 37: class BloomFilter;
unused


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@39
PS10, Line 39: class IcebergDeleteNode;
not needed here


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@78
PS10, Line 78: t
nit: to early line break here


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@79
PS10, Line 79: build partitions
This solution doesn't use build partitions AFAICT.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@81
PS10, Line 81: phase
nit: 'build phase', or 'this phase' instead of 'the phase'


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@89
PS10, Line 89: ///
             : ///
             : ///
nit: too many empty lines


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@104
PS10, Line 104: virtual
nit: 'virtual' keyword is redundant here


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@186
PS10, Line 186: Get the next row batch from the probe (left) side (child(0))
This operator always gets the next row bratch from the probe side, right?


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@210
PS10, Line 210:   /// Additional state for flushing build-side data. Needed for
              :   /// separate build.
              :   /// TODO: IMPALA-9411: this could be removed if we attached the buffers.
              :   bool flushed_unattachable_build_buffers_ = false;
Not relevant for this operator?


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@223
PS10, Line 223:   bool hashed_ = false;
              :   bool matched_ = false;
Please add comments for the members


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225
PS10, Line 225: long long
nit: int64_t


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@225
PS10, Line 225:   std::vector<long long>* current_delete_id_vec_;
              :   std::vector<long long>::iterator current_delete_id_it_;
              :   impala_udf::BigIntVal current_probe_id_;
              :   uint8_t* current_filename_ = nullptr;
              :   int current_file_size_;
Could be moved to a struct.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@230
PS10, Line 230: size_t
nit: int32_t


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@231
PS10, Line 231: size_t
int32_t


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.h@234
PS10, Line 234:   std::unordered_set<std::string> files_in_batch_;
              :   std::unordered_set<uint8_t*> str_ptr_in_batch_;
              :   std::unordered_map<std::string, std::vector<long long>>::iterator last_it_;
Could be moved to a struct.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@52
PS10, Line 52: using strings::Substitute
Unnecessary because of common/names.h


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@142
PS10, Line 142: 
nit: unnecessary empty line


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@345
PS10, Line 345: (
nit: I see no closing parenthesis.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@359
PS10, Line 359:   ;
nit: unnecessary ;


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@392
PS10, Line 392: {
              :       }
empty block


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@402
PS10, Line 402:       current_probe_id_ = probe_expr_evals_[0]->GetBigIntVal(current_probe_row_);
              :       StringVal strval = probe_expr_evals_[1]->GetStringVal(current_probe_row_);
Instead of getting the values via virtual functions, we could just retrieve the relevant slots directly.

Also, since these are coming from the probe side, where we are using virtual columns (INPUT__FILE__NAME, FILE__POSITION), there's no need for NULL-checking.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@408
PS10, Line 408:       if (it == builder_->delete_hash.end()) {
              :         matched_ = false;
Currently we are only scanning data files that have corresponding delete files, so this should be always false, right?

However, this could be true when we will have support for equality deletes. But anyway, maybe we could put it into UNLIKELY macro at least.


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.cc@416
PS10, Line 416: 
              : 
              :         if (!hashed_)
nit: else


http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.inline.h
File be/src/exec/iceberg-delete-node.inline.h:

http://gerrit.cloudera.org:8080/#/c/19850/10/be/src/exec/iceberg-delete-node.inline.h@18
PS10, Line 18: #ifndef IMPALA_EXEC_ICEBERG_DELETE_NODE_INLINE_H
             : #define IMPALA_EXEC_ICEBERG_DELETE_NODE_INLINE_H
nit: use #pragma once



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 16 May 2023 12:57:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 9:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13043/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 9
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 11:11:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 10:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/ImpalaService.thrift@795
PS10, Line 795: of
nit: off


http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/10/common/thrift/Query.thrift@638
PS10, Line 638: 159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override
This is good for perf testing for now, but we should limit the number of query options, so it would be good if we could remove this from the final PS.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@606
PS10, Line 606:   private PlanFragment createPartitionedIcebergDeleteFragment(IcebergDeleteNode node,
There's a lot of code shared between this and createPartitionedHashJoinFragment(). Can we refactor the common parts to new private methods?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@621
PS10, Line 621:     if (lhsHasCompatPartition && rhsHasCompatPartition
              :         && isCompatPartition(leftChildFragment.getDataPartition(),
              :                rightChildFragment.getDataPartition(), lhsJoinExprs, rhsJoinExprs,
              :                analyzer))
This should be always true for this operator, right? Either we can remove this and the unnecessary parameters, or turn it into a precondition.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@645
PS10, Line 645: lhsHasCompatPartition
Always true?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@659
PS10, Line 659: rhsHasCompatPartition
Always true?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@694
PS10, Line 694:     switch (node.getJoinOp()) {
Do we need this switch-case?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@725
PS10, Line 725:   private PlanFragment createIcebergDeleteFragment(IcebergDeleteNode node,
It shares a lot of code with createHashJoinFragment(). Can we refactor the common parts?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java@767
PS10, Line 767: LOG.trace(
Why emitting this on a new log line?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
File fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java:

http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@48
PS10, Line 48:  * Hash join between left child (outer) and right child (inner). One child must be the
             :  * plan generated for a table ref. Typically, that is the right child, but due to join
             :  * inversion (for outer/semi/cross joins) it could also be the left child.
Copy-pasted comment, irrelevant to this operator.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@83
PS10, Line 83:       if (!t0.matchesType(t1)) {
             :         // With decimal and char types, the child types do not have to match because
             :         // the equality builtin handles it. However, they will not hash correctly so
             :         // insert a cast.
             :         boolean bothDecimal = t0.isDecimal() && t1.isDecimal();
             :         boolean bothString = t0.isStringType() && t1.isStringType();
             :         if (!bothDecimal && !bothString) {
             :           throw new InternalException("Cannot compare " + t0.toSql() + " to " + t1.toSql()
             :               + " in join predicate.");
             :         }
             :         Type compatibleType =
             :             Type.getAssignmentCompatibleType(t0, t1, false, analyzer.isDecimalV2());
             :         if (compatibleType.isInvalid()) {
             :           throw new InternalException(
             :               String.format("Unable create a hash join with equi-join predicate %s "
             :                       + "because the operands cannot be cast without loss of precision. "
             :                       + "Operand types: %s = %s.",
             :                   eqPred.toSql(), t0.toSql(), t1.toSql()));
             :         }
             :         Preconditions.checkState(
             :             compatibleType.isDecimal() || compatibleType.isStringType());
             :         try {
             :           if (!t0.equals(compatibleType)) {
             :             eqPred.setChild(0, eqPred.getChild(0).castTo(compatibleType));
             :           }
             :           if (!t1.equals(compatibleType)) {
             :             eqPred.setChild(1, eqPred.getChild(1).castTo(compatibleType));
             :           }
             :         } catch (AnalysisException e) {
             :           throw new InternalException("Should not happen", e);
             :         }
             :       }
Unneeded. I think we just need a couple of preconditions that we got the expected predicates.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@151
PS10, Line 151:   /**
              :    * Helper to get the equi-join conjuncts in a thrift representation.
              :    */
              :   public List<TEqJoinCondition> getThriftEquiJoinConjuncts() {
              :     List<TEqJoinCondition> equiJoinConjuncts = new ArrayList<>(eqJoinConjuncts_.size());
              :     for (Expr entry : eqJoinConjuncts_) {
              :       BinaryPredicate bp = (BinaryPredicate) entry;
              :       TEqJoinCondition eqJoinCondition = new TEqJoinCondition(
              :           bp.getChild(0).treeToThrift(), bp.getChild(1).treeToThrift(),
              :           bp.getOp() == BinaryPredicate.Operator.NOT_DISTINCT);
              : 
              :       equiJoinConjuncts.add(eqJoinCondition);
              :     }
              :     return equiJoinConjuncts;
              :   }
Could be moved to JoinNode?


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@216
PS10, Line 216:       // The memory of the data stored in hash table and
              :       // the memory of the hash table‘s structure
              :       perBuildInstanceDataBytes = (long) Math.ceil(rhsCard * getChild(1).getAvgRowSize()
              :           + BitUtil.roundUpToPowerOf2((long) Math.ceil(3 * rhsCard / 2))
              :               * PlannerContext.SIZE_OF_BUCKET);
              :       if (rhsNdv > 1 && rhsNdv < rhsCard) {
              :         perBuildInstanceDataBytes +=
              :             (rhsCard - rhsNdv) * PlannerContext.SIZE_OF_DUPLICATENODE;
              :       }
Needs new calculation.


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@235
PS10, Line 235: joinOp_ == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN
Always false


http://gerrit.cloudera.org:8080/#/c/19850/10/fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java@264
PS10, Line 264:     if (joinOp_ == JoinOperator.NULL_AWARE_LEFT_ANTI_JOIN) {
              :       // Only one of the NAAJ probe streams is written at a time, so it needs a max-sized
              :       // buffer. If the build is integrated, we already have a max sized buffer accounted
              :       // for in the build reservation.
              :       perInstanceProbeMinMemReservation =
              :           hasSeparateBuild() ? maxRowBufferSize + bufferSize : bufferSize * 2;
              :     }
Dead code



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 17 May 2023 12:35:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/hash-table.h
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
30 files changed, 2,054 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 8
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19850/7/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/7/common/thrift/Query.thrift@634
PS7, Line 634:   159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/7/common/thrift/Query.thrift@634
PS7, Line 634:   159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override;  
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/7/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/7/tests/query_test/test_iceberg.py@1115
PS7, Line 1115: 1
flake8: E501 line too long (93 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19850/7/tests/query_test/test_iceberg.py@1115
PS7, Line 1115:  
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/7/tests/query_test/test_iceberg.py@1115
PS7, Line 1115:     cls.ImpalaTestMatrix.add_dimension(ImpalaTestDimension('optimized_v2_partitioned', 0, 1)) 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 7
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 11 May 2023 13:28:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 4:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-builder.h@180
PS4, Line 180:   /// immediately succeeds. Returns false and sets 'status' if it was unable to append the row, even after spilling
line too long (115 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h@64
PS4, Line 64: /// Operator to perform iceberg delete. 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h@95
PS4, Line 95: /// IcebergDeleteState of the builder will drive the iceberg delete algorithm across all the
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h@121
PS4, Line 121:   // This enum drives the state machine in GetNext() that processes probe batches and generates
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h@126
PS4, Line 126:   // producing a variable number of output rows. When the processing is done EOS is entered,
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h@128
PS4, Line 128:   // start                     
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h@136
PS4, Line 136:   //     + PROBING_END_BATCH +------------->+       EOS      | 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.h@163
PS4, Line 163:   bool inline ProcessProbeRow(RowBatch::Iterator* out_batch_iterator, int* remaining_capacity,
line too long (94 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.cc@67
PS4, Line 67:       auto hdfs_table_ = static_cast<const HdfsTableDescriptor*>(tuple_desc_->table_desc());      
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.cc@67
PS4, Line 67:       auto hdfs_table_ = static_cast<const HdfsTableDescriptor*>(tuple_desc_->table_desc());      
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.cc@79
PS4, Line 79:       //const string& native_file_path = file_path.native();   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.cc@447
PS4, Line 447:       auto it = builder_->delete_hash.find(std::string(reinterpret_cast<const char*>(strval.ptr), strval.len));
line too long (111 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/be/src/exec/iceberg-delete-node.cc@470
PS4, Line 470:           current_delete_id_it_ = std::lower_bound(it->second.begin(),it->second.end(),current_probe_id_.val);
line too long (110 > 90)


http://gerrit.cloudera.org:8080/#/c/19850/4/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/4/common/thrift/Query.thrift@635
PS4, Line 635:   159: optional bool optimized_v2_partitioned = true;  
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/4/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/4/tests/query_test/test_iceberg.py@1115
PS4, Line 1115: 1
flake8: E501 line too long (93 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/19850/4/tests/query_test/test_iceberg.py@1115
PS4, Line 1115:  
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/4/tests/query_test/test_iceberg.py@1115
PS4, Line 1115:     cls.ImpalaTestMatrix.add_dimension(ImpalaTestDimension('optimized_v2_partitioned', 0, 1)) 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 4
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 10 May 2023 14:15:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#29). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/00000-0-data-gfurnstahl_20230705104815_d51294d1-5658-497c-94f2-54269eb011be-job_16881608248131_0342-1-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/delete-74487c6141f8028e-98cbf79b00000000_453028296_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/01dc3e0b-fe92-4d60-973b-fcbb58f71be5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/e6680781-452a-41d3-a149-0136fa868069-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-4821756033809199889-1-e6680781-452a-41d3-a149-0136fa868069.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-7963820769190930835-1-01dc3e0b-fe92-4d60-973b-fcbb58f71be5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
49 files changed, 3,713 insertions(+), 151 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/29
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 29
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Testing:
  - Duplicated related planner tests to run both with new operator and
hash join
  - Added a dimension for e2e tests to run both with new operator and
hash join
  - Added new multiblock tests to verify assumptions used in new
operator to optimize probing
  - Added new test with BATCH_SIZE=2 to verify in/out batch handling
with new operator

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Reviewed-on: http://gerrit.cloudera.org:8080/19850
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/00000-0-data-gfurnstahl_20230705104815_d51294d1-5658-497c-94f2-54269eb011be-job_16881608248131_0342-1-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/data/delete-74487c6141f8028e-98cbf79b00000000_453028296_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/01dc3e0b-fe92-4d60-973b-fcbb58f71be5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/e6680781-452a-41d3-a149-0136fa868069-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-4821756033809199889-1-e6680781-452a-41d3-a149-0136fa868069.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/snap-7963820769190930835-1-01dc3e0b-fe92-4d60-973b-fcbb58f71be5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_lineitem_multiblock/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
49 files changed, 3,555 insertions(+), 156 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 35
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 24:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-builder.cc@193
PS24, Line 193:       
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-builder.cc@198
PS24, Line 198:       
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-builder.cc@322
PS24, Line 322:         
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 24
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 30 Jun 2023 22:02:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Implemented new exec node to optimize iceberg v2 read performance.

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to intergrate
features of PartitionedHashJoin if needed (partitioning, spilling).

Still TODO:

 - Clean/slice up the join part for easier digestion.
 - Double check code comments

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
M be/src/exec/hash-table.h
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
30 files changed, 2,054 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 7
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19850/9/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/9/common/thrift/Query.thrift@638
PS9, Line 638:   159: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override;
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 9
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 15 May 2023 10:51:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 30:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13479/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 30
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 09:40:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 24:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@473
PS24, Line 473: current_probe_row_ != nullptr
> Yes, at the beginning of a batch. This input/output batch logic coming from
Done


http://gerrit.cloudera.org:8080/#/c/19850/24/be/src/exec/iceberg-delete-node.cc@475
PS24, Line 475: current_probe_row_ != nullptr
> Same as above
Done


http://gerrit.cloudera.org:8080/#/c/19850/28/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/19850/28/common/thrift/Query.thrift@646
PS28, Line 646:   160: optional TJoinDistributionMode optimized_iceberg_v2_read_distribution_mode_override
> line has trailing whitespace
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 24
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 04 Jul 2023 10:24:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#28). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
IcebergDeleteNode node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector parallel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Still TODO:
 - revisit resource profile/cost calculations
 - multi split testing

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables-hash-join.test
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
35 files changed, 3,138 insertions(+), 148 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/28
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 28
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 34:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9486/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 34
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Peter Rozsa <pr...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 05 Jul 2023 16:14:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13284/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 15
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Jun 2023 15:36:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: WIP IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 11:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/13100/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 11
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 23 May 2023 10:48:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 20:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/19850/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/20//COMMIT_MSG@9
PS20, Line 9: Implemented new exec node to optimize iceberg v2 read performance.
Please elaborate on how this new operator works.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@50
PS20, Line 50: Since it is expected to only be
             : /// created and used by IcebergDeletePlanNode only, the DataSinkConfig::Init()
             : /// and DataSinkConfig::CreateSink() are not implemented for it.
I see this comment is also present at PhjBuilderConfig, but I'm not sure if I understand it. Especially that CreateSink() and Init() are implemented.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@57
PS20, Line 57: a
nit: an


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@95
PS20, Line 95: ///
Please write a short summary about how the builder works.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@100
PS20, Line 100: 
nit: unnecessary empty line.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@158
PS20, Line 158: spilling partitions
This builder doesn't partition its data and doesn't spill.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@180
PS20, Line 180: const
Could be 'static const', or 'static constexpr'


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@180
PS20, Line 180: _
If it will be a static member then we'll not need the last '_'.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@21
PS20, Line 21: #include <iomanip>
Unused?


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@156
PS20, Line 156:         // if(delete_scan_node == nullptr){}
Code line commented out


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@175
PS20, Line 175:       auto tuple_id_ = delete_scan_node->tnode_->hdfs_scan_node.tuple_id;
              :       auto tuple_desc_ = runtime_state_->desc_tbl().GetTupleDescriptor(tuple_id_);
              :       DCHECK(tuple_desc_->table_desc() != NULL);
              :       auto hdfs_table_ =
              :           static_cast<const HdfsTableDescriptor*>(tuple_desc_->table_desc());
              :       HdfsPartitionDescriptor* partition_desc =
              :           hdfs_table_->GetPartition(split.partition_id());
These should be the same for all the splits as there's a single HMS partition for Icebergtables.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@184
PS20, Line 184: hdfs_table_->IsIcebergTable()
This should be a DCHECK I think.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@200
PS20, Line 200:       DCHECK(retval.second == true);
We wn't hit this in RELEASE build, so the next statement would probably crash Impala. Probably it would be better to return a Status error.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@289
PS20, Line 289: inline bool IcebergDeleteBuilder::AppendRow(
              :     BufferedTupleStream* stream, TupleRow* row, Status* status) {
              :   if (LIKELY(stream->AddRow(row, status))) return true;
              :   return false;
              : }
Unused?


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@331
PS20, Line 331: auto it = deleted_rows_.find(*file_path);
nit: for readability, can we move this out of the if stmt?


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@61
PS20, Line 61: s
Isn't it only one hash table?


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@68
PS20, Line 68:  
+ 'that'?


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@81
PS20, Line 81: 
Redundant empty line


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@88
PS20, Line 88: virtual
'virtual' keywords are redundant here


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@106
PS20, Line 106: ,
nit: .?


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@119
PS20, Line 119: 
Unnecessary empty line


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@141
PS20, Line 141: the
multiple 'the'


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@212
PS20, Line 212: const
Can be 'static const' or 'constexpr'. Also the last '_' is not needed in this case.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@25
PS20, Line 25: #include "codegen/llvm-codegen.h"
Unused


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@163
PS20, Line 163: }
              : v
missing empty line


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@166
PS20, Line 166: attached attached
multiple 'attached'


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@218
PS20, Line 218:   // Putting SCOPED_TIMER in the IR version of ProcessProbeBatch() causes weird exception
              :   // handling IR in the xcompiled function, so call it here instead.
This operator doesn't do codegen.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@333
PS20, Line 333: namespace impala {
This line could be moved to L46 where we have 'using namespace impala'


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@347
PS20, Line 347:   current_probe_pos_ = *probe_pos;
There could be a fast path when file_path == previous_file_path_


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@354
PS20, Line 354: current_file_path_ != previous_file_path_
Why do we need both 'current_file_path_' when we also have the parameter 'file_path'?


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@380
PS20, Line 380: current_deleted_pos_row_id_
To me it seems like that the code has the assumption that the file positions are coming in strictly ascending order, even across row batch boundaries, but this might not be true when a file have multiple splits, and either the followings are true:

 * MT_DOP = 0 # The multi-threaded SCAN operator might produces rows simultaneously from different splits of the same file
 * PARTITIONED mode # If multiple splits of the same file are assigned to different SCAN operators, then the EXCHANGE operators can also get the rows simultaneously from different splits

I think the first case can be resolved by resolving the state for each input row batch. The PARTITIONED case need to be double-checked, to make sure we don't coalesce small row batches in the EXCHANGE RECEIVERs.



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 20
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Jun 2023 16:13:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 21:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19850/21/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/21/tests/query_test/test_iceberg.py@1121
PS21, Line 1121: #
flake8: E265 block comment should start with '# '



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 21
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jun 2023 13:19:07 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 22:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19850/22/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/19850/22/tests/query_test/test_iceberg.py@1121
PS22, Line 1121: #
flake8: E265 block comment should start with '# '



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 22
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jun 2023 13:20:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 20:

(31 comments)

http://gerrit.cloudera.org:8080/#/c/19850/20//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19850/20//COMMIT_MSG@9
PS20, Line 9: Implemented new exec node to optimize iceberg v2 read performance.
> Please elaborate on how this new operator works.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h
File be/src/exec/iceberg-delete-builder.h:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@50
PS20, Line 50: Since it is expected to only be
             : /// created and used by IcebergDeletePlanNode only, the DataSinkConfig::Init()
             : /// and DataSinkConfig::CreateSink() are not implemented for it.
> I see this comment is also present at PhjBuilderConfig, but I'm not sure if
Good catch, skimmed through git history, the comment is obsolate now for PHJ too


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@57
PS20, Line 57: a
> nit: an
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@95
PS20, Line 95: ///
> Please write a short summary about how the builder works.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@100
PS20, Line 100: 
> nit: unnecessary empty line.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@158
PS20, Line 158: spilling partitions
> This builder doesn't partition its data and doesn't spill.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@180
PS20, Line 180: const
> Could be 'static const', or 'static constexpr'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.h@180
PS20, Line 180: _
> If it will be a static member then we'll not need the last '_'.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@21
PS20, Line 21: #include <iomanip>
> Unused?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@156
PS20, Line 156:         // if(delete_scan_node == nullptr){}
> Code line commented out
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@175
PS20, Line 175:       auto tuple_id_ = delete_scan_node->tnode_->hdfs_scan_node.tuple_id;
              :       auto tuple_desc_ = runtime_state_->desc_tbl().GetTupleDescriptor(tuple_id_);
              :       DCHECK(tuple_desc_->table_desc() != NULL);
              :       auto hdfs_table_ =
              :           static_cast<const HdfsTableDescriptor*>(tuple_desc_->table_desc());
              :       HdfsPartitionDescriptor* partition_desc =
              :           hdfs_table_->GetPartition(split.partition_id());
> These should be the same for all the splits as there's a single HMS partiti
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@184
PS20, Line 184: hdfs_table_->IsIcebergTable()
> This should be a DCHECK I think.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@200
PS20, Line 200:       DCHECK(retval.second == true);
> We wn't hit this in RELEASE build, so the next statement would probably cra
IIRC this is not even an issue, as we can read multiple splits of the same file.


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@289
PS20, Line 289: inline bool IcebergDeleteBuilder::AppendRow(
              :     BufferedTupleStream* stream, TupleRow* row, Status* status) {
              :   if (LIKELY(stream->AddRow(row, status))) return true;
              :   return false;
              : }
> Unused?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-builder.cc@331
PS20, Line 331: auto it = deleted_rows_.find(*file_path);
> nit: for readability, can we move this out of the if stmt?
Done, maybe in a few years... :)


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h
File be/src/exec/iceberg-delete-node.h:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@61
PS20, Line 61: s
> Isn't it only one hash table?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@68
PS20, Line 68:  
> + 'that'?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@81
PS20, Line 81: 
> Redundant empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@88
PS20, Line 88: virtual
> 'virtual' keywords are redundant here
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@106
PS20, Line 106: ,
> nit: .?
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@119
PS20, Line 119: 
> Unnecessary empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@141
PS20, Line 141: the
> multiple 'the'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.h@212
PS20, Line 212: const
> Can be 'static const' or 'constexpr'. Also the last '_' is not needed in th
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc
File be/src/exec/iceberg-delete-node.cc:

http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@25
PS20, Line 25: #include "codegen/llvm-codegen.h"
> Unused
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@163
PS20, Line 163: }
              : v
> missing empty line
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@166
PS20, Line 166: attached attached
> multiple 'attached'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@218
PS20, Line 218:   // Putting SCOPED_TIMER in the IR version of ProcessProbeBatch() causes weird exception
              :   // handling IR in the xcompiled function, so call it here instead.
> This operator doesn't do codegen.
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@333
PS20, Line 333: namespace impala {
> This line could be moved to L46 where we have 'using namespace impala'
Done


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@347
PS20, Line 347:   current_probe_pos_ = *probe_pos;
> There could be a fast path when file_path == previous_file_path_
As we discussed. that would needs a full string comparison, which would be somewhat faster on the fast path (we spare a hashing and possible (but unlikely) collision resolutions), but a costly on the slow path. It still worth it if we assume there are bunch of deletes per file


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@354
PS20, Line 354: current_file_path_ != previous_file_path_
> Why do we need both 'current_file_path_' when we also have the parameter 'f
If we use the objects from the unordered map, we can compare the pointers, file_path coming from the Tuple


http://gerrit.cloudera.org:8080/#/c/19850/20/be/src/exec/iceberg-delete-node.cc@380
PS20, Line 380: current_deleted_pos_row_id_
> To me it seems like that the code has the assumption that the file position
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 20
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jun 2023 13:21:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#22). ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................

IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

IcebergDeleteNode and IcebergDeleteBuild classes are based on
PartitionedHashJoin counterparts. The actual "join" part of the node is
optimized, while others are kept very similarly, to be able to integrate
features of PartitionedHashJoin if needed (partitioning, spilling).

ICEBERG_DELETE_JOIN is added as a join operator which is used only by
this node.

IcebergDeleteBuild processes the data from the relevant delete files and
stores them in a {file_path: ordered row id vector} hash map.

IcebergDeleteNode tracks the processed file and progresses through the
row id vector paralel to the probe batch to check if a row is deleted
or hashes the probe row's file path and uses binary search to find the
closest row id if it is needed for the check.

Still TODO:
 - iceberg delete node specific compute costs and broadcast/distribution
mode deciding logic
 - multi split testing

Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
---
M be/src/codegen/impala-ir.cc
M be/src/exec/CMakeLists.txt
M be/src/exec/blocking-join-node.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exec-node.cc
A be/src/exec/iceberg-delete-builder.cc
A be/src/exec/iceberg-delete-builder.h
A be/src/exec/iceberg-delete-node.cc
A be/src/exec/iceberg-delete-node.h
A be/src/exec/iceberg-delete-node.inline.h
M be/src/exec/join-builder.h
M be/src/exec/join-op.h
M be/src/runtime/query-state.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/PlanNodes.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/JoinOperator.java
M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinBuildSink.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/NestedLoopJoinNode.java
M testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-planner/queries/PlannerTest/tablesample.test
M tests/query_test/test_iceberg.py
31 files changed, 1,936 insertions(+), 103 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/19850/22
-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 22
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19850 )

Change subject: IMPALA-11619: Improve Iceberg V2 reads with a custom Iceberg Position Delete operator
......................................................................


Patch Set 22:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/13417/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I024a61573c83bda5584f243c879d9ff39dd2dcfa
Gerrit-Change-Number: 19850
Gerrit-PatchSet: 22
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 28 Jun 2023 13:42:53 +0000
Gerrit-HasComments: No