You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2022/09/19 11:19:00 UTC

[jira] [Resolved] (IMPALA-11484) Create SCAN plan for Iceberg V2 position delete tables

     [ https://issues.apache.org/jira/browse/IMPALA-11484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy resolved IMPALA-11484.
----------------------------------------
    Fix Version/s: Impala 4.2.0
       Resolution: Fixed

> Create SCAN plan for Iceberg V2 position delete tables
> ------------------------------------------------------
>
>                 Key: IMPALA-11484
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11484
>             Project: IMPALA
>          Issue Type: New Feature
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>             Fix For: Impala 4.2.0
>
>
> Iceberg position delete files store the full URIs and and file positions of rows that are deleted. Therefore we can do an ANTI HASH JOIN between data files and delete files to retrieve only the active rows.
> For the data file rows we need to get the virtual columns INPUT_FILE_NAME and FILE_POSITION, while in the delete files we need to retrieve the columns 'file_path' and 'pos': https://iceberg.apache.org/spec/#position-delete-files
> Since the data files are in table schema, and the delete files are in a different schema, we need to create a virtual table for the delete files with the corresponding schema.
> Iceberg tells us which delete files must be applied to which data files, i.e. if a data file doesn't have a corresponding delete file, the content can be just UNION'ed with the output of the ANTI HASH JOIN.
> See more information in the design doc: https://docs.google.com/document/d/1WF_UOanQ61RUuQlM4LaiRWI0YXpPKZ2VEJ8gyJdDyoY/edit#heading=h.5gc49pcc2543



--
This message was sent by Atlassian Jira
(v8.20.10#820010)