You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org> on 2022/06/27 08:34:10 UTC

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Gergely Fürnstáhl has uploaded this change for review. ( http://gerrit.cloudera.org:8080/18639


Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

TBD

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
4 files changed, 117 insertions(+), 21 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/6
-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 6
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
8 files changed, 402 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/10
-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/data/iceberg_test/iceberg_migrated_alter_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/c9f83a82-60f4-443b-9ca4-359cad16fe12-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/snap-2941076094076108396-1-c9f83a82-60f4-443b-9ca4-359cad16fe12.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/340a3b82-71e3-4f50-b030-aecb5a5ea730-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/snap-2205107170480729038-1-340a3b82-71e3-4f50-b030-aecb5a5ea730.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/152e384f-2851-44b7-9ada-1bfbec74e9fc-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/snap-3911840040574896148-1-152e384f-2851-44b7-9ada-1bfbec74e9fc.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/8588fd4b-13c1-4451-80ad-5cf71a959b94-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/snap-3622599918649152504-1-8588fd4b-13c1-4451-80ad-5cf71a959b94.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/version-hint.text
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
32 files changed, 1,874 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/14
-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 14
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/data/iceberg_test/iceberg_migrated_alter_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/c9f83a82-60f4-443b-9ca4-359cad16fe12-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/snap-2941076094076108396-1-c9f83a82-60f4-443b-9ca4-359cad16fe12.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/340a3b82-71e3-4f50-b030-aecb5a5ea730-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/snap-2205107170480729038-1-340a3b82-71e3-4f50-b030-aecb5a5ea730.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/152e384f-2851-44b7-9ada-1bfbec74e9fc-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/snap-3911840040574896148-1-152e384f-2851-44b7-9ada-1bfbec74e9fc.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/8588fd4b-13c1-4451-80ad-5cf71a959b94-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/snap-3622599918649152504-1-8588fd4b-13c1-4451-80ad-5cf71a959b94.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/version-hint.text
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
32 files changed, 1,846 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/12
-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 12
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10960/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 16
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Jul 2022 14:26:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 14:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@310
PS12, Line 310:     if (LIKELY(child->hasAttributeKey(ICEBERG_FIELD_ID))) {
> nit: I think you could add UNLIKELY here
Done


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311
PS12, Line 311:       std::string field_id_str = child->getAttributeValue(ICEBERG_FIELD_ID);
> During init we only checked whether the root type's first child has field i
Done


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311
PS12, Line 311:       std::string field_id_str = child->getAttributeValue(ICEBERG_FIELD_ID);
              :       child_field_id = GetFieldIdFromStr(field_id_str);
              :     } else {
              :       child_field_id = GetGeneratedFieldID(child);
> nit: For readability, you could move it to a function e.g. "GetGeneratedFie
Done


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@835
PS12, Line 835:     const int& field_id) const {
> nit: you could add LIKELY here
Done


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@838
PS12, Line 838:     SchemaNode* child = &node->children[idx];
              : 
              :     int child_field_id = 0;
              : 
> nit: For readability, you could move it to a function e.g. "GetGeneratedFie
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 14
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Jul 2022 11:30:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/data/iceberg_test/iceberg_migrated_alter_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/c9f83a82-60f4-443b-9ca4-359cad16fe12-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/snap-2941076094076108396-1-c9f83a82-60f4-443b-9ca4-359cad16fe12.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/340a3b82-71e3-4f50-b030-aecb5a5ea730-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/snap-2205107170480729038-1-340a3b82-71e3-4f50-b030-aecb5a5ea730.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/152e384f-2851-44b7-9ada-1bfbec74e9fc-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/snap-3911840040574896148-1-152e384f-2851-44b7-9ada-1bfbec74e9fc.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/8588fd4b-13c1-4451-80ad-5cf71a959b94-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/snap-3622599918649152504-1-8588fd4b-13c1-4451-80ad-5cf71a959b94.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/version-hint.text
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
32 files changed, 1,874 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/16
-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 16
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10882/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 6
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 27 Jun 2022 08:54:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 12:

(6 comments)

Left a few minor comments, but the change looks great!

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc
File be/src/exec/orc-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@310
PS12, Line 310:     if (!child->hasAttributeKey(ICEBERG_FIELD_ID)) {
nit: I think you could add UNLIKELY here


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311
PS12, Line 311:       // Already traversed through the file during init, find is guaranteed to hit
During init we only checked whether the root type's first child has field id. What happens if some types have field ids and some don't? We should probably raise an error / return null.


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/orc-metadata-utils.cc@311
PS12, Line 311:       // Already traversed through the file during init, find is guaranteed to hit
              :       DCHECK(orc_type_to_field_id_.find(child) != orc_type_to_field_id_.end());
              : 
              :       child_field_id = orc_type_to_field_id_.find(child)->second;
nit: For readability, you could move it to a function e.g. "GetGeneratedFieldID"


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@835
PS12, Line 835:     if (child->element->__isset.field_id) {
nit: you could add LIKELY here


http://gerrit.cloudera.org:8080/#/c/18639/12/be/src/exec/parquet/parquet-metadata-utils.cc@838
PS12, Line 838:       // Traversed through the file during init, find is guaranteed to hit
              :       DCHECK(schema_node_to_field_id_.find(child) != schema_node_to_field_id_.end());
              : 
              :       child_field_id = schema_node_to_field_id_.find(child)->second;
nit: For readability, you could move it to a function e.g. "GetGeneratedFieldID"


http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py@106
PS12, Line 106:   def test_migrated_table_field_id_resolution(self, vector, unique_database):
you should add annotation "@SkipIf.not_hdfs"



-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 12
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 08 Jul 2022 14:31:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 16: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 16
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Jul 2022 13:06:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Reviewed-on: http://gerrit.cloudera.org:8080/18639
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/data/iceberg_test/iceberg_migrated_alter_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/c9f83a82-60f4-443b-9ca4-359cad16fe12-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/snap-2941076094076108396-1-c9f83a82-60f4-443b-9ca4-359cad16fe12.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/340a3b82-71e3-4f50-b030-aecb5a5ea730-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/snap-2205107170480729038-1-340a3b82-71e3-4f50-b030-aecb5a5ea730.avro
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_alter_test_orc/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/152e384f-2851-44b7-9ada-1bfbec74e9fc-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/snap-3911840040574896148-1-152e384f-2851-44b7-9ada-1bfbec74e9fc.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test/metadata/version-hint.text
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/000000_0
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/8588fd4b-13c1-4451-80ad-5cf71a959b94-m0.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/snap-3622599918649152504-1-8588fd4b-13c1-4451-80ad-5cf71a959b94.avro
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v1.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/v2.metadata.json
A testdata/data/iceberg_test/iceberg_migrated_complex_test_orc/metadata/version-hint.text
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
32 files changed, 1,874 insertions(+), 21 deletions(-)

Approvals:
  Zoltan Borok-Nagy: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 17
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 10:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18639/10/tests/common/file_utils.py
File tests/common/file_utils.py:

http://gerrit.cloudera.org:8080/#/c/18639/10/tests/common/file_utils.py@54
PS10, Line 54: s
flake8: W291 trailing whitespace


http://gerrit.cloudera.org:8080/#/c/18639/10/tests/common/file_utils.py@54
PS10, Line 54:   impala_client.execute("""create external table {0} stored as iceberg location '{1}' 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jun 2022 16:51:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 16: Code-Review+2

(1 comment)

LGTM!

http://gerrit.cloudera.org:8080/#/c/18639/14/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/14/be/src/exec/parquet/parquet-metadata-utils.cc@847
PS14, Line 847: retur
> I would not return INVALID_ID on L850. idx >= size signals that we ran out 
OK, I see that at L802 we check the case when 'file_idx >= node->children.size()'



-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 16
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Jul 2022 08:13:12 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 16:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/8308/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 16
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 14 Jul 2022 08:13:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10944/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 14
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 11 Jul 2022 11:50:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
7 files changed, 181 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/8
-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 8
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10898/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 11
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jun 2022 17:13:35 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
---
M be/src/exec/orc-metadata-utils.cc
M be/src/exec/orc-metadata-utils.h
M be/src/exec/parquet/parquet-metadata-utils.cc
M be/src/exec/parquet/parquet-metadata-utils.h
M testdata/data/README
A testdata/workloads/functional-query/queries/QueryTest/iceberg-migrated-table-field-id-resolution.test
M tests/common/file_utils.py
M tests/query_test/test_iceberg.py
8 files changed, 402 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/18639/11
-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 11
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10900/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 12
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Jun 2022 07:25:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 8:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/18639/8/tests/common/file_utils.py
File tests/common/file_utils.py:

http://gerrit.cloudera.org:8080/#/c/18639/8/tests/common/file_utils.py@29
PS8, Line 29: o
flake8: E501 line too long (97 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/common/file_utils.py@33
PS8, Line 33: ,
flake8: E231 missing whitespace after ','


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/common/file_utils.py@33
PS8, Line 33: a
flake8: E501 line too long (105 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/common/file_utils.py@49
PS8, Line 49: r
flake8: E501 line too long (175 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/common/file_utils.py@53
PS8, Line 53: '
flake8: E501 line too long (126 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18639/8/tests/query_test/test_iceberg.py@107
PS8, Line 107: t
flake8: E501 line too long (111 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/query_test/test_iceberg.py@108
PS8, Line 108: m
flake8: E501 line too long (113 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/query_test/test_iceberg.py@109
PS8, Line 109: t
flake8: E501 line too long (111 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/query_test/test_iceberg.py@110
PS8, Line 110: m
flake8: E501 line too long (113 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/18639/8/tests/query_test/test_iceberg.py@111
PS8, Line 111: q
flake8: E501 line too long (103 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 8
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jun 2022 16:41:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10897/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 10
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jun 2022 17:09:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/10896/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 8
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 29 Jun 2022 17:00:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Gergely Fürnstáhl (Code Review)" <ge...@cloudera.org>.
Gergely Fürnstáhl has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/18639/14/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/14/be/src/exec/parquet/parquet-metadata-utils.cc@847
PS14, Line 847: break
> We could return idx here and INVALID_ID at L850, in case we didn't find a c
I would not return INVALID_ID on L850. idx >= size signals that we ran out of columns, INVALID_ID signals that the file is corrupted and handled differently by the caller


http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py
File tests/query_test/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/18639/12/tests/query_test/test_iceberg.py@106
PS12, Line 106:     time.sleep(1)
> you should add annotation "@SkipIf.not_hdfs"
S3A supports "hadoop fs" calls and "hdfs dfs" is translated into that.

Tested it on S3 environment and the test passes, no need to skip



-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 14
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 13 Jul 2022 14:03:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/18639 )

Change subject: IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables
......................................................................


Patch Set 14: Code-Review+1

(1 comment)

Looks great!

http://gerrit.cloudera.org:8080/#/c/18639/14/be/src/exec/parquet/parquet-metadata-utils.cc
File be/src/exec/parquet/parquet-metadata-utils.cc:

http://gerrit.cloudera.org:8080/#/c/18639/14/be/src/exec/parquet/parquet-metadata-utils.cc@847
PS14, Line 847: break
We could return idx here and INVALID_ID at L850, in case we didn't find a child with the given id.



-- 
To view, visit http://gerrit.cloudera.org:8080/18639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Gerrit-Change-Number: 18639
Gerrit-PatchSet: 14
Gerrit-Owner: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Gabor Kaszab <ga...@cloudera.com>
Gerrit-Reviewer: Gergely Fürnstáhl <gf...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tamas Mate <tm...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 12 Jul 2022 15:35:29 +0000
Gerrit-HasComments: Yes