You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "HonahX (via GitHub)" <gi...@apache.org> on 2024/04/06 06:41:11 UTC

[I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]

HonahX opened a new issue, #588:
URL: https://github.com/apache/iceberg-python/issues/588

   ### Question
   
   Currently, the HiveCatalog's `_commit_table` workflow looks like:
   
   1. load current table metadata via `load_table`
   2. construct updated metadata
   3. lock the hive table
   4. alter the hive table
   5. unlock the hive table
   
   Suppose now there are 2 process, A and B try to commit some changes to the same iceberg table It is possible that the code execution happens to be in the following order:
   
   1. process A load current table metadata
   2. process A construct updated metadata
   3. process B starts and finishes the **whole** `_commit_table`
   4. process A lock the hive table
   5. process A alter the hive table
   6. process A unlock the hive table
   
   In this specific scenario, both processes successfully commit their changes because process B releases the lock before A tries to acquire. But if the `alter_table` does not support [transactional check](https://issues.apache.org/jira/browse/HIVE-26882), the changes made by process B will be overridden. 
   
   Since in python we do not know which Hive version we are connecting to, I wonder if we need to update the code to lock the table before loading current table metadata, like what [Java implementation](https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L184) does.
   
   BTW, it seems there are some consistency issue of https://issues.apache.org/jira/browse/HIVE-26882 as well and there is an open fix for that https://github.com/apache/hive/pull/5129
   
   Please correct me if I misunderstand something here. Thanks!
   
   cc: @Fokko 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko commented on issue #588:
URL: https://github.com/apache/iceberg-python/issues/588#issuecomment-2056289982

   @HonahX Thanks for spotting this, and I agree that we should include the refreshing and updating of the metadata in the transaction. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [I] Concern about possible consistency issue in HiveCatalog's _commit_table [iceberg-python]

Posted by "Fokko (via GitHub)" <gi...@apache.org>.
Fokko closed issue #588: Concern about possible consistency issue in HiveCatalog's _commit_table
URL: https://github.com/apache/iceberg-python/issues/588


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org