You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2023/12/15 12:56:00 UTC

[jira] [Created] (IMPALA-12640) Remove IcebergDeleteSink

Zoltán Borók-Nagy created IMPALA-12640:
------------------------------------------

             Summary: Remove IcebergDeleteSink
                 Key: IMPALA-12640
                 URL: https://issues.apache.org/jira/browse/IMPALA-12640
             Project: IMPALA
          Issue Type: Bug
          Components: Backend, Frontend
            Reporter: Zoltán Borók-Nagy


UPDATE part 3 CR (https://gerrit.cloudera.org/#/c/20760/) introduces a new sink operator for position delete records: IcebergBufferedDeleteSink.

The new operator can be used in the context of UPDATEs even in the case when updating a partition column value, or the table has SORT BY properties.

IcebergBufferedDeleteSink doesn't require sorting by delete partitions, file paths, and positions, as it takes care of it.

The only area where IcebergBufferedDeleteSink lags behind IcebergDeleteSink is that it cannot spill to disk. But since it stores filepaths and positions in a compact format it is unlikely that it would ever need to spill to disk in a real life situation. E.g. even if there are 100M rows need to be deleted per Impala executor, the amount of memory required is not much larger than 800 MBs per executor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)