You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2022/08/15 14:10:01 UTC

[Impala-ASF-CR] WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18847


Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are added as 'hidden
columns' to Iceberg tables. 'Hidden column' is a new concept introduced
by this patch.

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
    SCAN data       ANTI JOIN
              SCAN data    SCAN deletes

Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner.

TODO:
 * better cardinality estimates
 * handling complex types
 * add tests

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
18 files changed, 776 insertions(+), 472 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11261/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 17:16:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#6).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
98 files changed, 4,214 insertions(+), 547 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
lipenglin@sensorsdata.cn has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 4: Code-Review+1

(6 comments)

LGTM! Just left a few minor comments.

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@559
PS4, Line 559: dataFiles
nit: This is a pair of List<DataFile> and Set<DeleteFile>, maybe we should rename it.


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@129
PS4, Line 129: 2147483545
Just want to know: where does this number come from?


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@35
PS4, Line 35: import org.apache.impala.catalog.IcebergColumn;
            : import org.apache.impala.catalog.IcebergTable;
nit: It may be unnecessary.


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@39
PS4, Line 39: import org.apache.impala.catalog.Type;
ditto.


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@48
PS4, Line 48: import org.apache.impala.catalog.Type;
nit: it may be unnecessary.


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@93
PS4, Line 93: TimeTravelSpec
nit: should be IcebergScanPlanner.



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Tue, 23 Aug 2022 03:59:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@178
PS5, Line 178:       if (!isDeleteRowsJoin_ || detailLevel.ordinal() >= TExplainLevel.EXTENDED.ordinal()) {
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 14:28:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/7/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/7/tests/query_test/test_iceberg.py@804
PS7, Line 804: 
flake8: W391 blank line at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 13:51:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 10:

(6 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@572
PS7, Line 572: Iceberg file '{}
> nit: This is not just for dataFiles anymore, right?
Done


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@579
PS7, Line 579: getFilePathHash(con
> nit: same as above: this is for deleteFiles as well, right?
Done


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@683
PS7, Line 683:      * If 'travelSpec' is null then the current snapshot is being used.
> Could you please add a comment that with travelSpec=null the current snapsh
Done


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@692
PS7, Line 692: 
> Don't you need a try-catch block similarly to L699-706?
Expanded the scope of the try block.


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@718
PS7, Line 718:       // There are no snapshots for the tables created for the first time.
> nit: curly brackets are not needed here.
Done


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@558
PS7, Line 558:         deleteFiles.addAll(scanTask.deletes());
> I think instead of this for loop you can use Set.addAll().
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 16:58:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 10: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Aug 2022 13:58:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 12:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8521/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Sep 2022 11:59:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Sep 2022 11:59:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#8).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
99 files changed, 4,214 insertions(+), 552 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/9/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/9/tests/query_test/test_iceberg.py@808
PS9, Line 808: 
flake8: W292 no newline at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 16:48:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#3).

Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved FeFsTable to FeIcebergTable.

TODO:
 * better cardinality estimates ()
 * support unrelative collection columns (select item from t.int_array)
 * add tests

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
19 files changed, 1,181 insertions(+), 539 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/3
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 4:

(8 comments)

I know it's easier said than done, but separating measurable amount of refactor and new feature helps the reviewing process.

I wish gerrit had some feature to support it outside the codebase, e.g. creating a virtual diff of a.java:100-250 and b.java:250-400 to show the diff and mark it in another color in the original files.

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java@48
PS4, Line 48:  * Iceberg position delete table created on the fly during planning. It belongs to an
nit: "table is created"


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java@50
PS4, Line 50: if
nit: "of"


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
File fe/src/main/java/org/apache/impala/catalog/VirtualTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@41
PS4, Line 41: taht
nit: "that"


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@204
PS4, Line 204:           .add(HdfsFileFormat.ICEBERG)
Doesn't iceberg tables returns the underlying file format?

https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java#L202


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@125
PS4, Line 125: createIcebergScanNode
nit: I think this name is a bit confusing, as we do have an IcebergScanNode class, but not necessarily returning that type.


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@197
PS4, Line 197: 1000000
magic number


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@202
PS4, Line 202:     JoinNode acidJoin = new HashJoinNode(dataScanNode, deleteScanNode,
             :         /*straight_join=*/true, distributionMode, JoinOperator.LEFT_ANTI_JOIN,
             :         positionJoinConjuncts, /*otherJoinConjuncts=*/Collections.emptyList());
             :     acidJoin.setId(ctx_.getNextNodeId());
             :     acidJoin.init(analyzer_);
             :     acidJoin.setIsAcidJoin();
As I understand this is (similar but) unrelated to ACID, we should not let the names creep up.


http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py@765
PS4, Line 765:   @SkipIfDockerizedCluster.internal_hostname
This could be fixed with the custom operator in the future, if we add support for relative URI-s right?



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 24 Aug 2022 12:06:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#5).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
100 files changed, 4,159 insertions(+), 546 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/5
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 5:

(17 comments)

Great change! Added some minor comments and will try to do another round in the afternoon.

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@106
PS5, Line 106:                                     
nit: is this our preferred indentation for longer loops?


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
File fe/src/main/java/org/apache/impala/catalog/VirtualTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@25
PS5, Line 25: import java.util.Set;
Unused import.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@34
PS5, Line 34: import org.apache.impala.thrift.TResultSet;
            : import org.apache.impala.thrift.TIcebergPartitionStats;
Unused import.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@37
PS5, Line 37: import org.apache.impala.thrift.TTableStats;
Unused import.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@35
PS5, Line 35: import org.apache.impala.catalog.IcebergColumn;
            : import org.apache.impala.catalog.IcebergTable;
Unused import.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@39
PS5, Line 39: import org.apache.impala.catalog.VirtualColumn;
Unused import.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@1
PS5, Line 1: 
nit: empty line


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@35
PS5, Line 35: import org.apache.iceberg.DataFile;
Unused import.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@88
PS5, Line 88: import com.google.common.collect.Iterables;
Unused import.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@117
PS5, Line 117: getIceTable
Shouldn't this getter be next to the others?


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@133
PS5, Line 133: feFsTable
Unused variable.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@181
PS5, Line 181: ALIAS_COUNTER
We could use the PlanNodeId here as well, it would not grow across query executions.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@198
PS5, Line 198: operator
nit: node


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@212
PS5, Line 212: feTable
Unused variable.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@227
PS5, Line 227: feTable
Unused variable.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@232
PS5, Line 232:                  
nit: indentation.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@278
PS5, Line 278: fileDescList
Unused variable.



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 10:02:32 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11253/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 12:36:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 10: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/datasets/functional/schema_constraints.csv
File testdata/datasets/functional/schema_constraints.csv:

http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/datasets/functional/schema_constraints.csv@86
PS10, Line 86: table_format:parquet/none/none
Is the table_format correct for these new ORC backed Iceberg tables?


http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test:

http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test@3
PS10, Line 3: iceberg_v2_no_deletes_orc
Could you please add some info about these new tables into a README? e.g. how they were created, what rows were deleted, etc.


http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test@9
PS10, Line 9: aggregation(SUM, NumRowGroups): 0aggregation(SUM, NumFileMetadataRead): 0
nit: break line in the middle?



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Aug 2022 10:14:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#11).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/data/00000-0-data-boroknagyz_20220830153046_7c9aaa51-d601-4c5b-a51c-67c1e9e1fe29-job_16618645111570_0001-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/a72290c9-c518-4719-8502-6c83a881de07-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/snap-1041485290740594175-1-a72290c9-c518-4719-8502-6c83a881de07.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/588ee1ca-6a85-4af2-8ba2-e595e71712ba-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/5b14eaa8-83cb-4f71-a473-402f345fa5b5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5359840930115020310-1-5b14eaa8-83cb-4f71-a473-402f345fa5b5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5416468273053855108-1-588ee1ca-6a85-4af2-8ba2-e595e71712ba.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-data-boroknagyz_20220830153712_a4060925-000f-4454-b68b-8f7570d3aa7c-job_16618645111570_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-delete-boroknagyz_20220830153733_8f03a46b-e4a5-4643-896b-a1cc5b09cf2b-job_16618645111571_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/83d17f01-336e-41ab-a791-ffd5f511f6ab-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-1801547319505512253-1-6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-4807054508647143162-1-83d17f01-336e-41ab-a791-ffd5f511f6ab.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155232_f36b3f3e-183a-475a-a7d1-d16898fd4524-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155307_610e0b78-1186-477b-a30b-9943cb4e90e9-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155411_eee97f13-df3d-408c-bcff-8e3b0dcf8fa7-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155430_32c848c9-37ad-4f77-a0fd-0a254d15246c-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/560427d8-9e81-4111-9b40-3aeb73af7049-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/5f58eff5-d9a4-44eb-a208-244d0b6b927b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/666190a5-67e0-431f-b142-7ddf9f933d4f-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/9677edab-12db-429c-aa24-5e3a112caed4-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/a8152ff6-a39c-435a-916b-d0df02388331-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/e1f383b4-0b8a-4a19-8c70-4ea23978309d-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5003445199566617082-1-5f58eff5-d9a4-44eb-a208-244d0b6b927b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5936629849848085185-1-e1f383b4-0b8a-4a19-8c70-4ea23978309d.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8476486151350891395-1-9677edab-12db-429c-aa24-5e3a112caed4.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8479298697893993480-1-560427d8-9e81-4111-9b40-3aeb73af7049.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8621312932233473501-1-666190a5-67e0-431f-b142-7ddf9f933d4f.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8634123667305136474-2-a8152ff6-a39c-435a-916b-d0df02388331.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
156 files changed, 5,985 insertions(+), 555 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11185/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Thu, 18 Aug 2022 17:18:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#4).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M fe/src/test/java/org/apache/impala/planner/PlannerTestBase.java
M fe/src/test/java/org/apache/impala/testutil/TestUtils.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
98 files changed, 4,104 insertions(+), 542 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/4
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#10).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/data/00000-0-data-boroknagyz_20220830153046_7c9aaa51-d601-4c5b-a51c-67c1e9e1fe29-job_16618645111570_0001-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/a72290c9-c518-4719-8502-6c83a881de07-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/snap-1041485290740594175-1-a72290c9-c518-4719-8502-6c83a881de07.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/588ee1ca-6a85-4af2-8ba2-e595e71712ba-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/5b14eaa8-83cb-4f71-a473-402f345fa5b5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5359840930115020310-1-5b14eaa8-83cb-4f71-a473-402f345fa5b5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5416468273053855108-1-588ee1ca-6a85-4af2-8ba2-e595e71712ba.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-data-boroknagyz_20220830153712_a4060925-000f-4454-b68b-8f7570d3aa7c-job_16618645111570_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-delete-boroknagyz_20220830153733_8f03a46b-e4a5-4643-896b-a1cc5b09cf2b-job_16618645111571_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/83d17f01-336e-41ab-a791-ffd5f511f6ab-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-1801547319505512253-1-6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-4807054508647143162-1-83d17f01-336e-41ab-a791-ffd5f511f6ab.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155232_f36b3f3e-183a-475a-a7d1-d16898fd4524-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155307_610e0b78-1186-477b-a30b-9943cb4e90e9-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155411_eee97f13-df3d-408c-bcff-8e3b0dcf8fa7-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155430_32c848c9-37ad-4f77-a0fd-0a254d15246c-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/560427d8-9e81-4111-9b40-3aeb73af7049-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/5f58eff5-d9a4-44eb-a208-244d0b6b927b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/666190a5-67e0-431f-b142-7ddf9f933d4f-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/9677edab-12db-429c-aa24-5e3a112caed4-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/a8152ff6-a39c-435a-916b-d0df02388331-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/e1f383b4-0b8a-4a19-8c70-4ea23978309d-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5003445199566617082-1-5f58eff5-d9a4-44eb-a208-244d0b6b927b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5936629849848085185-1-e1f383b4-0b8a-4a19-8c70-4ea23978309d.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8476486151350891395-1-9677edab-12db-429c-aa24-5e3a112caed4.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8479298697893993480-1-560427d8-9e81-4111-9b40-3aeb73af7049.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8621312932233473501-1-666190a5-67e0-431f-b142-7ddf9f933d4f.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8634123667305136474-2-a8152ff6-a39c-435a-916b-d0df02388331.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
156 files changed, 5,942 insertions(+), 555 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/10/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/10/tests/query_test/test_iceberg.py@808
PS10, Line 808: 
flake8: W292 no newline at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 16:58:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 10:

(3 comments)

Thanks for the comments.

http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/datasets/functional/schema_constraints.csv
File testdata/datasets/functional/schema_constraints.csv:

http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/datasets/functional/schema_constraints.csv@86
PS10, Line 86: table_format:parquet/none/none
> Is the table_format correct for these new ORC backed Iceberg tables?
Yes, we currently put every table to the functional_parquet database.

It would be nice to separate them to ORC, or creating a new functional_iceberg_parquet/orc database, but the data loading job and current test dimensions don't really support this.


http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test:

http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test@3
PS10, Line 3: iceberg_v2_no_deletes_orc
> Could you please add some info about these new tables into a README? e.g. h
Done


http://gerrit.cloudera.org:8080/#/c/18847/10/testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test@9
PS10, Line 9: aggregation(SUM, NumRowGroups): 0aggregation(SUM, NumFileMetadataRead): 0
> nit: break line in the middle?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Aug 2022 16:32:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11232/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 14:00:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py@765
PS4, Line 765:   @SkipIfDockerizedCluster.internal_hostname
> This could be fixed with the custom operator in the future, if we add suppo
Yes, but we'll need to rewrite the data files as they currently have absolute paths.

I don't know how Iceberg will handle the case when it replicates a table that already has absolute paths. But we can handle all these complexities in a custom operator.



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 15:34:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 11: Code-Review+2

Thx for the changes!


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Sep 2022 08:56:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#9).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/data/00000-0-data-boroknagyz_20220830153046_7c9aaa51-d601-4c5b-a51c-67c1e9e1fe29-job_16618645111570_0001-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/a72290c9-c518-4719-8502-6c83a881de07-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/snap-1041485290740594175-1-a72290c9-c518-4719-8502-6c83a881de07.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/588ee1ca-6a85-4af2-8ba2-e595e71712ba-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/5b14eaa8-83cb-4f71-a473-402f345fa5b5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5359840930115020310-1-5b14eaa8-83cb-4f71-a473-402f345fa5b5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5416468273053855108-1-588ee1ca-6a85-4af2-8ba2-e595e71712ba.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-data-boroknagyz_20220830153712_a4060925-000f-4454-b68b-8f7570d3aa7c-job_16618645111570_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-delete-boroknagyz_20220830153733_8f03a46b-e4a5-4643-896b-a1cc5b09cf2b-job_16618645111571_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/83d17f01-336e-41ab-a791-ffd5f511f6ab-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-1801547319505512253-1-6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-4807054508647143162-1-83d17f01-336e-41ab-a791-ffd5f511f6ab.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155232_f36b3f3e-183a-475a-a7d1-d16898fd4524-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155307_610e0b78-1186-477b-a30b-9943cb4e90e9-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155411_eee97f13-df3d-408c-bcff-8e3b0dcf8fa7-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155430_32c848c9-37ad-4f77-a0fd-0a254d15246c-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/560427d8-9e81-4111-9b40-3aeb73af7049-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/5f58eff5-d9a4-44eb-a208-244d0b6b927b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/666190a5-67e0-431f-b142-7ddf9f933d4f-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/9677edab-12db-429c-aa24-5e3a112caed4-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/a8152ff6-a39c-435a-916b-d0df02388331-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/e1f383b4-0b8a-4a19-8c70-4ea23978309d-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5003445199566617082-1-5f58eff5-d9a4-44eb-a208-244d0b6b927b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5936629849848085185-1-e1f383b4-0b8a-4a19-8c70-4ea23978309d.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8476486151350891395-1-9677edab-12db-429c-aa24-5e3a112caed4.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8479298697893993480-1-560427d8-9e81-4111-9b40-3aeb73af7049.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8621312932233473501-1-666190a5-67e0-431f-b142-7ddf9f933d4f.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8634123667305136474-2-a8152ff6-a39c-435a-916b-d0df02388331.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
156 files changed, 5,941 insertions(+), 553 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/9
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 9:

PS9 is a rebase + ORC tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 16:49:25 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/18847/1/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@166
PS1, Line 166:         // The followings just create separate scan nodes for data files and position delete
line too long (92 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Mon, 15 Aug 2022 14:10:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/8500/


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 18:41:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 5:

(21 comments)

Thanks for the comments, Tamas!

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@106
PS5, Line 106:                                     
> nit: is this our preferred indentation for longer loops?
I just found it easier to be read, i.e. it makes it clear what are the arguments of the concat().

I don't know how strict our formatting rules are, I guess we can deviate from it when needed. But I can change it if you think it makes the code harder to read because of inconsistent indentation.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@728
PS5, Line 728: 
> nit: empty line
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java@71
PS5, Line 71: true
> I could not find any reference, but I believe this column cannot be null.
Good catch, done.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
File fe/src/main/java/org/apache/impala/catalog/VirtualTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@25
PS5, Line 25: import java.util.Set;
> Unused import.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@34
PS5, Line 34: import org.apache.impala.thrift.TResultSet;
            : import org.apache.impala.thrift.TIcebergPartitionStats;
> Unused import.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@37
PS5, Line 37: import org.apache.impala.thrift.TTableStats;
> Unused import.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@79
PS5, Line 79:   @Override
            :   public String getTableComment() { return null; }
            : 
            :   @Override
            :   public boolean isLoaded() {
            :     return true;
            :   }
> nit: different short method format styles in this file.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@35
PS5, Line 35: import org.apache.impala.catalog.IcebergColumn;
            : import org.apache.impala.catalog.IcebergTable;
> Unused import.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@39
PS5, Line 39: import org.apache.impala.catalog.VirtualColumn;
> Unused import.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
File fe/src/main/java/org/apache/impala/planner/HashJoinNode.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/HashJoinNode.java@178
PS5, Line 178:       if (!isDeleteRowsJoin_ || detailLevel.ordinal() >= TExplainLevel.EXTENDED.ordinal()) {
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@1
PS5, Line 1: 
> nit: empty line
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@35
PS5, Line 35: import org.apache.iceberg.DataFile;
> Unused import.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@88
PS5, Line 88: import com.google.common.collect.Iterables;
> Unused import.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@117
PS5, Line 117: getIceTable
> Shouldn't this getter be next to the others?
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@133
PS5, Line 133: feFsTable
> Unused variable.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@181
PS5, Line 181: ALIAS_COUNTER
> We could use the PlanNodeId here as well, it would not grow across query ex
Good idea, thanks!


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@198
PS5, Line 198: operator
> nit: node
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@212
PS5, Line 212: feTable
> Unused variable.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@227
PS5, Line 227: feTable
> Unused variable.
Done


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@232
PS5, Line 232:                  
> nit: indentation.
It'd rather keep it this way for readability.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@278
PS5, Line 278: fileDescList
> Unused variable.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 13:39:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11233/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 14:09:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/11/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/11/tests/query_test/test_iceberg.py@808
PS11, Line 808: 
flake8: W292 no newline at end of file



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Aug 2022 16:32:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11272/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Aug 2022 17:01:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 10:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8516/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 31 Aug 2022 09:15:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 7: Code-Review+2

(1 comment)

Thanks for the changes Zoltan, LGTM!
The verify job failed on the negative test that forbid v2 format.

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
File fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java@106
PS5, Line 106:                                     
> I just found it easier to be read, i.e. it makes it clear what are the argu
It is fine, I was just curious if you know an indentation rule I was not aware of :).



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 09:19:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Anonymous Coward (Code Review)" <ge...@cloudera.org>.
lipenglin@sensorsdata.cn has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 7: Code-Review+1

That's great. LGTM!


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 10:37:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11153/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Mon, 15 Aug 2022 14:32:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 7:

PS7 is only a rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 13:52:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11171/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Wed, 17 Aug 2022 14:32:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 8:

Thanks everyone for the review!

I think I add some tests for ORC data files as well once https://gerrit.cloudera.org/#/c/18909/ goes in.

Then we can move forward merging this.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 12:34:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Reviewed-on: http://gerrit.cloudera.org:8080/18847
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/data/00000-0-data-boroknagyz_20220830153046_7c9aaa51-d601-4c5b-a51c-67c1e9e1fe29-job_16618645111570_0001-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/a72290c9-c518-4719-8502-6c83a881de07-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/snap-1041485290740594175-1-a72290c9-c518-4719-8502-6c83a881de07.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=click/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00002.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=download/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-data-boroknagyz_20220830173852_560d6f0e-8508-42d6-95a7-67bfa860a28c-job_16618645111571_0006-00003.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/data/action=view/00000-0-delete-boroknagyz_20220830173925_010bf879-cf2f-4209-a70b-1a8309657106-job_16618645111571_0006-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/588ee1ca-6a85-4af2-8ba2-e595e71712ba-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/5b14eaa8-83cb-4f71-a473-402f345fa5b5-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5359840930115020310-1-5b14eaa8-83cb-4f71-a473-402f345fa5b5.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/snap-5416468273053855108-1-588ee1ca-6a85-4af2-8ba2-e595e71712ba.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-data-boroknagyz_20220830153712_a4060925-000f-4454-b68b-8f7570d3aa7c-job_16618645111570_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/data/00000-0-delete-boroknagyz_20220830153733_8f03a46b-e4a5-4643-896b-a1cc5b09cf2b-job_16618645111571_0002-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/83d17f01-336e-41ab-a791-ffd5f511f6ab-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-1801547319505512253-1-6d54c2e9-a4c7-4c5c-8a6f-a17f92de4c48.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/snap-4807054508647143162-1-83d17f01-336e-41ab-a791-ffd5f511f6ab.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155232_f36b3f3e-183a-475a-a7d1-d16898fd4524-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155307_610e0b78-1186-477b-a30b-9943cb4e90e9-job_16618645111570_0003-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-data-boroknagyz_20220830155411_eee97f13-df3d-408c-bcff-8e3b0dcf8fa7-job_16618645111570_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155337_fcbda57d-f37d-48cd-afe4-28d0bad0a4c0-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/data/00000-0-delete-boroknagyz_20220830155430_32c848c9-37ad-4f77-a0fd-0a254d15246c-job_16618645111571_0004-00001.orc
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/560427d8-9e81-4111-9b40-3aeb73af7049-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/5f58eff5-d9a4-44eb-a208-244d0b6b927b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/666190a5-67e0-431f-b142-7ddf9f933d4f-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/9677edab-12db-429c-aa24-5e3a112caed4-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/a8152ff6-a39c-435a-916b-d0df02388331-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/e1f383b4-0b8a-4a19-8c70-4ea23978309d-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5003445199566617082-1-5f58eff5-d9a4-44eb-a208-244d0b6b927b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-5936629849848085185-1-e1f383b4-0b8a-4a19-8c70-4ea23978309d.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8476486151350891395-1-9677edab-12db-429c-aa24-5e3a112caed4.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8479298697893993480-1-560427d8-9e81-4111-9b40-3aeb73af7049.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8621312932233473501-1-666190a5-67e0-431f-b142-7ddf9f933d4f.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/snap-8634123667305136474-2-a8152ff6-a39c-435a-916b-d0df02388331.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files_orc/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
M testdata/workloads/functional-query/queries/QueryTest/iceberg-negative.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes-orc.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
156 files changed, 5,985 insertions(+), 555 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Gabor Kaszab (Code Review)" <ge...@cloudera.org>.
Gabor Kaszab has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 8:

(6 comments)

Sorry for being this late to the review. I did a quick go through and left some minor comments.

http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@572
PS7, Line 572: Iceberg DataFile
nit: This is not just for dataFiles anymore, right?


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@579
PS7, Line 579: getDataFilePathHash
nit: same as above: this is for deleteFiles as well, right?


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@683
PS7, Line 683:      */
Could you please add a comment that with travelSpec=null the current snapshot is going to be used?


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@692
PS7, Line 692:         long totalDeleteFiles = Long.parseLong(totalDeleteFilesStr);
Don't you need a try-catch block similarly to L699-706?


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@718
PS7, Line 718:       if (snapshot == null) { return null; }
nit: curly brackets are not needed here.


http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/18847/7/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@558
PS7, Line 558:         for (DeleteFile delFile : scanTask.deletes()) {
I think instead of this for loop you can use Set.addAll().



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 12:52:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#7).

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
A fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HashJoinNode.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/JoinNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/data/README
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/data/00000-0-data-boroknagyz_20220819180420_a7e5a731-8762-4b59-b3f2-fe6f065cf59b-job_16597105613620_0031-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/5c80922f-01b5-4d52-bc93-6505be3b977b-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/snap-728158873687794725-1-5c80922f-01b5-4d52-bc93-6505be3b977b.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_no_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00002.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=click/874b32d9a15da206-f60e01cb00000003_1034098606_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00003.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=download/874b32d9a15da206-f60e01cb00000003_1489587766_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/00000-0-delete-boroknagyz_20220819183231_cfc565f5-52b9-4669-9f69-d29c50a84a5e-job_16597105613621_0032-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/data/action=view/874b32d9a15da206-f60e01cb00000004_1711435901_data.0.parq
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/464c179e-c9ba-40f5-a35f-144106a1f16c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/771485e9-78ac-4ffc-b1ef-1fda5bab33cf-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-2057976186205897384-1-771485e9-78ac-4ffc-b1ef-1fda5bab33cf.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/snap-8885697082976537578-1-464c179e-c9ba-40f5-a35f-144106a1f16c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_partitioned_position_deletes/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-data-boroknagyz_20220819150927_e95cefe8-5549-42f6-98a6-b1bbea9a4224-job_16597105613620_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/data/00000-0-delete-boroknagyz_20220819150939_8f0789df-fcdb-4ba8-875c-4aeac53f30b9-job_16597105613621_0025-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/236523f7-a5bc-459f-b4c9-16af5bd43bca-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/816400dd-012d-40c5-ab65-bc16ff18d2d7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-444149380144800647-1-816400dd-012d-40c5-ab65-bc16ff18d2d7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/snap-8593920101374128463-1-236523f7-a5bc-459f-b4c9-16af5bd43bca.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_delete_all_rows/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154646_1cad8c38-c65e-4c7c-b516-c4d9faf82448-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154718_db95aeae-d530-4fba-8336-e47fa712b987-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154733_162cd69d-80e2-425b-9638-9008d0937573-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-data-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613620_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819154922_62429d29-6c44-4707-b348-ac189b8d79d3-job_16597105613621_0026-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/data/00000-0-delete-boroknagyz_20220819155000_b0411d64-66e3-49fd-a2f0-dab69282a896-job_16597105613621_0027-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/969de65c-8915-4ae5-8d54-a82701195c55-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/c4e37595-32c2-483f-b6d7-866f60e36976-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/db6f17fe-6fb6-4120-839d-4d6ca5244a1c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ea749da1-7b98-4dca-a4eb-f7d5d62f9dde-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-1497619269847778439-1-ea749da1-7b98-4dca-a4eb-f7d5d62f9dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-4363979609026842966-1-db6f17fe-6fb6-4120-839d-4d6ca5244a1c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-5762682948883272650-1-969de65c-8915-4ae5-8d54-a82701195c55.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7490459762454857930-1-ec9e7ecc-b546-42dd-8d0d-0dde2182dbc7.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-7508485421322116327-1-4ba5ef32-7f8b-4418-b5bf-3fb8002e0dde.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/snap-752781918366351945-2-c4e37595-32c2-483f-b6d7-866f60e36976.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v5.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v6.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/v7.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_not_all_data_files_have_delete_files/metadata/version-hint.txt
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155751_2ddc0706-4c90-42db-927b-d92f68b8ae1e-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-data-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613620_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/data/00000-0-delete-boroknagyz_20220819155811_bf677db1-cb6c-451b-afc8-4013ae7eaf1f-job_16597105613621_0030-00001.parquet
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/0f93551b-cc05-4e55-a65b-6f78e634cf4c-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/d399b245-f138-4ae2-bd1f-f82e0ee16023-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/da27d446-94df-4850-85dd-c8edcda7685e-m0.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-3877007445826010687-2-da27d446-94df-4850-85dd-c8edcda7685e.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-5392552459484846077-1-d399b245-f138-4ae2-bd1f-f82e0ee16023.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/snap-6537343489442250314-1-0f93551b-cc05-4e55-a65b-6f78e634cf4c.avro
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v1.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v2.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v3.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/v4.metadata.json
A testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_positional_update_all_rows/metadata/version-hint.txt
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-planner/queries/PlannerTest/iceberg-v2-tables.test
A testdata/workloads/functional-query/queries/QueryTest/iceberg-v2-read-position-deletes.test
M tests/query_test/test_iceberg.py
98 files changed, 4,215 insertions(+), 547 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/7
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8500/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 13:52:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 12: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 01 Sep 2022 16:51:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11226/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 14:49:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11200/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Mon, 22 Aug 2022 11:38:06 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py@749
PS4, Line 749: class TestIcebergV2Table(IcebergTestSuite):
flake8: E302 expected 2 blank lines, found 1



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Comment-Date: Mon, 22 Aug 2022 11:17:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@728
PS5, Line 728: 
nit: empty line


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java@71
PS5, Line 71: true
I could not find any reference, but I believe this column cannot be null.


http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
File fe/src/main/java/org/apache/impala/catalog/VirtualTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/5/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@79
PS5, Line 79:   @Override
            :   public String getTableComment() { return null; }
            : 
            :   @Override
            :   public boolean isLoaded() {
            :     return true;
            :   }
nit: different short method format styles in this file.



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 12:39:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 5:

(14 comments)

Thanks for the comments!

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@559
PS4, Line 559: allFiles 
> nit: This is a pair of List<DataFile> and Set<DeleteFile>, maybe we should 
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java@48
PS4, Line 48: /**
> nit: "table is created"
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergPositionDeleteTable.java@50
PS4, Line 50: e 
> nit: "of"
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/IcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java@129
PS4, Line 129: _ID = 2147
> Just want to know: where does this number come from?
These are coming from the Iceberg spec https://iceberg.apache.org/spec/#position-delete-files


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java
File fe/src/main/java/org/apache/impala/catalog/VirtualTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/VirtualTable.java@41
PS4, Line 41: that
> nit: "that"
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@35
PS4, Line 35: import org.apache.impala.catalog.IcebergColumn;
            : import org.apache.impala.catalog.IcebergTable;
> nit: It may be unnecessary.
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@39
PS4, Line 39: import org.apache.impala.catalog.Virtu
> ditto.
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java@48
PS4, Line 48: import org.apache.impala.catalog.Virtu
> nit: it may be unnecessary.
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@204
PS4, Line 204:           .add(HdfsFileFormat.ICEBERG)
> Doesn't iceberg tables returns the underlying file format?
At the coordinator-side, yes. But at CatalogD-side and in PlannerTests it is Iceberg. In old catalog mode we do another hack to get the underlying fileformat, i.e. during thrift conversion we substitute the file format:
https://github.com/apache/impala/blob/bf103414012ba7596563c5f3292380f8b11524e9/fe/src/main/java/org/apache/impala/catalog/IcebergTable.java#L487

It was an easier change then adding another hack to HdfsPartition. Btw, don't we need this change anyway to handle mixed format tables?


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
File fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java:

http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@93
PS4, Line 93: 
> nit: should be IcebergScanPlanner.
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@125
PS4, Line 125: blRef;
> nit: I think this name is a bit confusing, as we do have an IcebergScanNode
Done. Renamed

 createIcebergScanNode -> createIcebergScanPlan
 createIcebergScanPlan -> createComplexIcebergScanPlan


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@197
PS4, Line 197: 
> magic number
I got rid of this code by calculating proper stats for the delete files and let Impala's planner figure out the distribution mode.


http://gerrit.cloudera.org:8080/#/c/18847/4/fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java@202
PS4, Line 202:     JoinNode joinNode = new HashJoinNode(dataScanNode, deleteScanNode,
             :         /*straight_join=*/true, DistributionMode.NONE, JoinOperator.LEFT_ANTI_JOIN,
             :         positionJoinConjuncts, /*otherJoinConjuncts=*/Collections.emptyList());
             :     joinNode.setId(ctx_.getNextNodeId());
             :     joinNode.init(analyzer_);
             :     joinNode.setIsDeleteRowsJ
> As I understand this is (similar but) unrelated to ACID, we should not let 
Done


http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18847/4/tests/query_test/test_iceberg.py@749
PS4, Line 749: 
> flake8: E302 expected 2 blank lines, found 1
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Aug 2022 14:28:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18847 )

Change subject: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/11260/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 17:06:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Tamas Mate, Gabor Kaszab, lipenglin@sensorsdata.cn, Gergely Fürnstáhl, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/18847

to look at the new patch set (#2).

Change subject: WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables
......................................................................

WIP: IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are added as 'hidden
columns' to Iceberg tables. 'Hidden column' is a new concept introduced
by this patch.

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner.

TODO:
 * better cardinality estimates
 * add tests

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
---
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/SelectStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/FeCatalogUtils.java
M fe/src/main/java/org/apache/impala/catalog/FeFsTable.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
M fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanPlanner.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/planner/UnionNode.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/data/iceberg_test/hadoop_catalog/ice/iceberg_v2_delete_positional/data/00191-4-6e780302-527b-4911-8c6e-88d416adac57-00001.parquet
18 files changed, 850 insertions(+), 537 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/47/18847/2
-- 
To view, visit http://gerrit.cloudera.org:8080/18847
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Gerrit-Change-Number: 18847
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <li...@sensorsdata.cn>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>