You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/09/20 04:19:29 UTC

[GitHub] [iceberg] aokolnychyi commented on pull request #1469: Add position delete validation that data files have not been deleted

aokolnychyi commented on pull request #1469:
URL: https://github.com/apache/iceberg/pull/1469#issuecomment-695686623


   I think this PR raises a very good point that we haven't considered for merge-on-read but already have for copy-on-write.
   
   This PR looks good to me but I want us to think through which validation we will eventually need. Let's consider the following use cases: DELETE and UPDATE with positional deletes, DELETE and UPDATE with equality deletes. Each operation may have different isolation levels: serializable and snapshot isolation (can be more but let's skip that for now).
   
   **DELETE with positional deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no new potentially matching data files since we read<br/> - data files referenced by new deletes must be still present<br/> - no validation on delete files as it is ok if the row was deleted concurrently
   | snapshot  |  - data files referenced by new deletes must be still present<br/> - no validation on new potentially matching data files since we read<br/>- no validation on delete files as it is ok if the row was deleted concurrently  |
   
   
   **UPDATE with positional deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no new potentially matching data files since we read<br/> - no new potentially matching delete files as it is NOT ok if the row was deleted concurrently <br/> - data files referenced by new deletes must be still present 
   | snapshot  |  - no new potentially matching delete files as it is NOT ok if the row was deleted concurrently <br/> - data files referenced by new deletes must be still present<br/> - no validation on new potentially matching data files since we read<br/>|
   
   **DELETE with equality deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no new potentially matching data files since we read<br/> - no validation on delete files as it is ok if the row was deleted concurrently
   | snapshot  | - no validation on new potentially matching data files since we read<br/>- no validation on delete files as it is ok if the row was deleted concurrently  |
   
   
   **UPDATE with equality deletes**
   
   | Isolation | Validation |
   | --------- | ------------- |
   | serializable  | - no validation on new potentially matching data files since we don't have to read the table <br/> - no validation on new potentially matching delete files as we don't have to read the table
   | snapshot  |  - no validation on new potentially matching data files since we don't have to read the table <br/> - no validation on new potentially matching delete files as we don't have to read the table |
   
   Does this seem correct?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org