You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2023/05/04 14:49:05 UTC

[Impala-ASF-CR] IMPALA-11877: (part 1) Add support for DELETE statements for UNPARTITIONED Iceberg tables

Hello Tamas Mate, Gabor Kaszab, lipenglin@apache.org, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19776

to look at the new patch set (#5).

Change subject: IMPALA-11877: (part 1) Add support for DELETE statements for UNPARTITIONED Iceberg tables
......................................................................

IMPALA-11877: (part 1) Add support for DELETE statements for UNPARTITIONED Iceberg tables

This patch adds support for DELETE statements on unpartitioned Iceberg
tables. Impala uses the 'merge-on-read' mode with position delete files.

The patch reuses the existing IcebergPositionDeleteTable as the target
table of the DELETE statements, because this table already has the same
schema as position delete files, even with correct Iceberg field IDs.

The patch basically rewrites DELETE statements to INSERT statements,
e.g.:

from:
 DELETE FROM ice_t WHERE id = 42;

to:
 INSERT INTO ice_t-POSITION-DELETE
 SELECT INPUT__FILE__NAME, FILE__POSITION
 FROM ice_t
 WHERE id = 42;

Position delete files need to be ordered by (file_position, pos), so
we add an extra SORT node before the table sink operator.

In the backend the patch adds a new table sink operator, the
IcebergDeleteSink. It writes the incoming rows (file_path, position) to
delete files. It reuses a lot of code from HdfsTableSink, so this patch
moves the common code to the new common base class: TableSinkBase.

The coordinator then collects the written delete files and invokes
UpdateCatalog to finalize the DELETE statement.

The Catalog then uses Iceberg APIs to create a new snapshot with the
created delete files. It also validates that there was no conflicting
data files written since the operation started.

Testing:
 * added planer test
 * e2e tests
 * interop test between Impala and Hive

Change-Id: Ic933b2295abe54b46d2a736961219988ff42915b
---
M be/src/exec/CMakeLists.txt
M be/src/exec/data-sink.cc
M be/src/exec/file-metadata-utils.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/exec/hdfs-table-writer.cc
M be/src/exec/hdfs-table-writer.h
A be/src/exec/iceberg-delete-sink.cc
A be/src/exec/iceberg-delete-sink.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/exec/parquet/hdfs-parquet-table-writer.h
A be/src/exec/table-sink-base.cc
A be/src/exec/table-sink-base.h
M be/src/runtime/coordinator.cc
M be/src/runtime/descriptors.cc
M be/src/service/client-request-state.cc
M common/thrift/CatalogObjects.thrift
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Query.thrift
M common/thrift/Types.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/VirtualColumn.java
A fe/src/main/java/org/apache/impala/planner/IcebergDeleteSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/IcebergCatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeModifyStmtsTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-delete.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-delete.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
M tests/query_test/test_iceberg.py
41 files changed, 1,734 insertions(+), 448 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/19776/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic933b2295abe54b46d2a736961219988ff42915b
Gerrit-Change-Number: 19776
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@apache.org>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>