You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/22 01:20:56 UTC

[GitHub] [iceberg] kbendick commented on pull request #4795: expose the latest snapshot id committed within a thread

kbendick commented on PR #4795:
URL: https://github.com/apache/iceberg/pull/4795#issuecomment-1133796964

   Hi @CodingCat 
   
   I’m trying to understand the situation you’re trying to solve for a bit more.
   
   > Because currentSnapshot() will trigger the refresh of metadata and may show the snapshot id committed by someone else in another concurrent thread eventually
   
   As mentioned on Slack, the metadata refresh on commit is to ensure that the state of the table is the same as it was when the write was prepared. This is how ACID compliance is achieved.
   
   I’m not sure I understand what you’re trying to achieve. I know you’d like to expose the snapshotId as it was when the current thread (or let’s just say writer) prepared it’s write, ie prior to the commit. But what do you intend to do with that information?
   
   > I think the scenario is more pervasive than our own case, e.g. each notebook attached to the Databricks' notebook cluster is basically handled by a thread. In such an scenario, users may fall into some race condition to get the snapshot id committed by their own notebook with just currentSnapshot().snapshotId
   
   What catalog are you using? You mention Databricks, and most people I’ve encountered using Iceberg on Databricks are using the `HadoopCatalog`. Which should _not_ be used in a production environment as there’s no locking mechanism to keep the current snapshot updateable by only one writer at a time (be it across threads or across Spark applications).
   
   It sounds like maybe you’re trying to get around the lack of a lock, but I worry that you’ll have conflicting writes and clobber the previous writers work.
   
   What do you intend to do with this thread local snapshot Id (particularly once it becomes outdated via some other writer).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org