You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/07/15 18:34:00 UTC

[jira] [Commented] (IMPALA-11034) Resolve schema of old data files in migrated Iceberg tables

    [ https://issues.apache.org/jira/browse/IMPALA-11034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17567351#comment-17567351 ] 

ASF subversion and git services commented on IMPALA-11034:
----------------------------------------------------------

Commit 8d034a2f7cd2f68714e83b28335b3baf18823d7c in impala's branch refs/heads/master from Gergely Fürnstáhl
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=8d034a2f7 ]

IMPALA-11034: Resolve schema of old data files in migrated Iceberg tables

When external tables are converted to Iceberg, the data files remain
intact, thus missing field IDs. Previously, Impala used name based
column resolution in this case.

Added a feature to traverse through the data files before column
resolution and assign field IDs the same way as iceberg would, to be
able to use field ID based column resolutions.

Testing:

Default resolution method was changed to field id for migrated tables,
existing tests use that from now.

Added new tests to cover edge cases with complex types and schema
evolution.

Change-Id: I77570bbfc2fcc60c2756812d7210110e8cc11ccc
Reviewed-on: http://gerrit.cloudera.org:8080/18639
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Resolve schema of old data files in migrated Iceberg tables
> -----------------------------------------------------------
>
>                 Key: IMPALA-11034
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11034
>             Project: IMPALA
>          Issue Type: Bug
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Gergely Fürnstáhl
>            Priority: Major
>              Labels: impala-iceberg
>
> When external tables are converted to Iceberg, the data files remain intact.
> This means that the old data files don't have field id information which is essential for schema evolution.
> However, there is a workaround for this, see: [https://github.com/trinodb/trino/issues/9843]
> We need to create a NameMapping which maps field ids to column names, then we can do column resolution in the legacy files with the help of the name mapping.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org