You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2022/09/02 11:07:00 UTC
[jira] [Commented] (IMPALA-11516) Create better cardinality estimates for Iceberg V2 tables with deletes

    [ https://issues.apache.org/jira/browse/IMPALA-11516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17599418#comment-17599418 ] 

ASF subversion and git services commented on IMPALA-11516:
----------------------------------------------------------

Commit 73da4d7ddfbcec33c68cd46986b35d7557dd7e67 in impala's branch refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=73da4d7dd ]

IMPALA-11484: Create SCAN plan for Iceberg V2 position delete tables

This patch adds support for reading Iceberg V2 tables use position
deletes. Equality deletes are still not supported. Position delete
files store the file path and file position of the deleted rows.

When an Iceberg table has position delete files we need to do an
ANTI JOIN between data files and delete files. From the data files
we need to query the virtual columns INPUT__FILE__NAME and
FILE__POSITION, while from the delete files we need the data columns
'file_path' and 'pos'. The latter data columns are not part of the
table schema, so we create a virtual table instance of
'IcebergPositionDeleteTable' that has a table schema corresponding
to the delete files ('file_path', 'pos').

This patch introduces a new class 'IcebergScanPlanner' which has
the responsibility of doing a plan for Iceberg table scans. It creates
the aforementioned ANTI JOIN. Also, if there are data files without
corresponding delete files, we can have a separate SCAN node and its
results would be UNIONed to the rows coming from the ANTI JOIN:

              UNION
             /     \
    SCAN data       ANTI JOIN
                     /      \
              SCAN data    SCAN deletes

Some refactorings in the context of this CR:
Predicate pushdown and time travel logic is transferred from
IcebergScanNode to IcebergScanPlanner. Iceberg snapshot summary
retrieval is moved from FeFsTable to FeIcebergTable.

Testing:
 * added planner test
 * added e2e tests

TODO in follow-up Jiras:
 * better cardinality estimates (IMPALA-11516)
 * support unrelative collection columns (select item from t.int_array)
   (IMPALA-11517)
   Currently such queries return error during analysis

Change-Id: I672cfee18d8e131772d90378d5b12ad4d0f7dd48
Reviewed-on: http://gerrit.cloudera.org:8080/18847
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


> Create better cardinality estimates for Iceberg V2 tables with deletes
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-11516
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11516
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Zoltán Borók-Nagy
>            Priority: Major
>              Labels: impala-iceberg
>
> Create better cardinality estimates for Iceberg V2 tables with deletes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org