You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Zoltán Borók-Nagy (Jira)" <ji...@apache.org> on 2022/06/13 13:09:00 UTC

[jira] [Assigned] (IMPALA-11331) Create Iceberg transactions earlier

     [ https://issues.apache.org/jira/browse/IMPALA-11331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltán Borók-Nagy reassigned IMPALA-11331:
------------------------------------------

    Assignee: Tamas Mate

> Create Iceberg transactions earlier
> -----------------------------------
>
>                 Key: IMPALA-11331
>                 URL: https://issues.apache.org/jira/browse/IMPALA-11331
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>            Reporter: Zoltán Borók-Nagy
>            Assignee: Tamas Mate
>            Priority: Major
>              Labels: impala-iceberg
>
> Currently we create Iceberg transactions via IcebergUtil.getIcebergTransaction() in CatalogOpExecutor.
> This is problematic in some cases, especially for INSERT OVERWRITEs, because in this case we open the transaction too late, when the data files are already written, then we open the transaction and commit it. INSERT statements in the meantime get overwritten instead of failing the INSERT OVERWRITE operation.
> This can be problematic when we try to use INSERT OVERWRITE for compacting a table. In that case we definitely don't want to loose INSERTed data.
> Moving transaction open/close to the coordinator requires a lot of work, and the handling of self-events would become even more complicated.
> Alternatively, we could initiate an open transaction from the Coordinator, i.e. asking CatalogD to open one, then at the end CatalogD would commit the opened transaction.
> We also need to abort transactions of failed queries. We also need ways of aborting transactions of crashed Coordinators.
>  
> UPDATE: Newer Iceberg releases will have an API to check for concurrent writes: [https://github.com/apache/iceberg/blob/9ab94f87de036c9cd91cf8353906a576b4a516ff/api/src/main/java/org/apache/iceberg/ReplacePartitions.java#L28-L34]
> Probably the most straightforward thing is to use this API. Save the current snapshot ID at the coordinator during planning, then propagate this information to CatalogD in TIcebergOperationParam.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org