You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2022/08/22 11:16:43 UTC

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#4).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
98 files changed, 4,104 insertions(+), 542 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>