You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Quanlong Huang (Code Review)" <ge...@cloudera.org> on 2022/08/26 00:25:57 UTC

[Impala-ASF-CR](branch-4.1.1) IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Hello Zoltan Borok-Nagy, Impala Public Jenkins,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/18912

to review the following change.


Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Reviewed-on: http://gerrit.cloudera.org:8080/18639
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/data/iceberg_test/iceberg_migrated_alter_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/c9f83a82-60f4-443b-9ca4-359cad16fe12-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/snap-2941076094076108396-1-c9f83a82-60f4-443b-9ca4-359cad16fe12.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/340a3b82-71e3-4f50-b030-aecb5a5ea730-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/snap-2205107170480729038-1-340a3b82-71e3-4f50-b030-aecb5a5ea730.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/152e384f-2851-44b7-9ada-1bfbec74e9fc-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/snap-3911840040574896148-1-152e384f-2851-44b7-9ada-1bfbec74e9fc.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/8588fd4b-13c1-4451-80ad-5cf71a959b94-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/snap-3622599918649152504-1-8588fd4b-13c1-4451-80ad-5cf71a959b94.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/version-hint.text
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
32 files changed, 1,874 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/12/18912/1
-- 
To view, visit http://gerrit.cloudera.org:8080/18912
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: newchange
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18912
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR](branch-4.1.1) IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Tamas Mate (Code Review)" <ge...@cloudera.org>.
Tamas Mate has posted comments on this change. ( http://gerrit.cloudera.org:8080/18912 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 1: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/18912
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18912
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 30 Aug 2022 08:00:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](branch-4.1.1) IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/18912 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 1: Verified+1

This is a clean cherrypick. Verified in https://jenkins.impala.io/job/gerrit-verify-dryrun/8498/ (ignored the flaky test failure).


-- 
To view, visit http://gerrit.cloudera.org:8080/18912
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18912
Gerrit-PatchSet: 1
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Aug 2022 05:39:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR](branch-4.1.1) IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18912 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Reviewed-on: http://gerrit.cloudera.org:8080/18639
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-on: http://gerrit.cloudera.org:8080/18912
Tested-by: Quanlong Huang <hu...@gmail.com>
Reviewed-by: Tamas Mate <tm...@apache.org>
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/data/iceberg_test/iceberg_migrated_alter_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/c9f83a82-60f4-443b-9ca4-359cad16fe12-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/snap-2941076094076108396-1-c9f83a82-60f4-443b-9ca4-359cad16fe12.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/340a3b82-71e3-4f50-b030-aecb5a5ea730-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/snap-2205107170480729038-1-340a3b82-71e3-4f50-b030-aecb5a5ea730.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/152e384f-2851-44b7-9ada-1bfbec74e9fc-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/snap-3911840040574896148-1-152e384f-2851-44b7-9ada-1bfbec74e9fc.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/8588fd4b-13c1-4451-80ad-5cf71a959b94-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/snap-3622599918649152504-1-8588fd4b-13c1-4451-80ad-5cf71a959b94.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/version-hint.text
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
32 files changed, 1,874 insertions(+), 21 deletions(-)

Approvals:
  Quanlong Huang: Verified
  Tamas Mate: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/18912
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: branch-4.1.1
Gerrit-MessageType: merged
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18912
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>