You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/24 23:12:02 UTC

[GitHub] [iceberg] RussellSpitzer commented on pull request #2925: Core: Support serializable isolation for ReplacePartitions

RussellSpitzer commented on pull request #2925:
URL: https://github.com/apache/iceberg/pull/2925#issuecomment-1050351570


   > @RussellSpitzer @flyrain this is the issue we talked about, if any thoughts on what is 'snapshot' isolation once we define 'serializable' isolation.
   
   https://jepsen.io/consistency // Jepsen's Doc provides a good overview
   https://jepsen.io/consistency/models/serializable 
   
   
   Repeatable Reads is a bit different than snapshot Isolation
   https://jepsen.io/consistency/models/repeatable-read
   ```Repeatable read is closely related to [serializability](https://jepsen.io/consistency/models/serializable), but unlike serializable, it allows [phantoms](http://pmg.csail.mit.edu/papers/adya-phd.pdf): if a transaction T1 reads a predicate, like "the set of all people with the name “Dikembe”, then another transaction T2 may create or modify a person with the name “Dikembe” before T1 commits. Individual objects are stable once read, but the predicate itself may not be.```
    
   https://jepsen.io/consistency/models/snapshot-isolation
   ```In a snapshot isolated system, each transaction appears to operate on an independent, consistent snapshot of the database. Its changes are visible only to that transaction until commit time, when all changes become visible atomically. If transaction T1 has modified an object x, and another transaction T2 committed a write to x after T1’s snapshot began, and before T1’s commit, then T1 must abort.```
   
   I'm pretty sure Postgres's definition there actually fits Snapshot Isolation better than Repeatable Read since [they don't allow Phantom Reads](https://www.postgresql.org/docs/13/transaction-iso.html#MVCC-ISOLEVEL-TABLE) which are allowed at a true Repeatable Reads Isolation . Snapshot Isolation in my mind says, you cannot modify records that were changed by a previous operation but you may modify records that were not modified by a previous commit while ignore the changes that previous commit produced.
   
   Stolen from [Sql-Server Blog](https://techcommunity.microsoft.com/t5/sql-server-blog/serializable-vs-snapshot-isolation-level/ba-p/383281)
   ![image](https://user-images.githubusercontent.com/413025/155622462-2a862637-a089-41fd-ac73-2032de7875fd.png)
   
   Here you imagine two commits as 
   "UPDATE color=white WHERE color = BLACK" 
   "UPDATE color= black where color = WHITE"
   
   Both of these commits are allowed to occur as if they applied to the same original commit because each operation only effected an isolated set of marbles.
   
   So in this case I believe INSERT OVERWRITE would never conflict with another INSERT, but would conflict with an update that changed any row within the partition being over-written. Another INSERT OVERWRITE would be a form of update so I believe @szehon-ho has the right of it here. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org