You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/04/01 18:04:00 UTC

[GitHub] [iceberg] rdblue commented on pull request #2354: Core: add row key to format v2

rdblue commented on pull request #2354:
URL: https://github.com/apache/iceberg/pull/2354#issuecomment-812077531


   I think this is now unrelated to this PR, but since discussion is happening here I want to mention it:
   
   > I have seen scenarios when users want to rollback the table state completely rather the current snapshot. I think that should be done by replacing the current pointer in the catalog to an old JSON file rather than by calling the table rollback API.
   
   I don't agree with the direction of adding an API to roll back the JSON file itself. That approach discards relevant history, like the fact that after `t5`, the table had a bad snapshot. That history is relevant and valuable. Here are a few examples:
   
   1. Users can see how long the bad snapshot was the current
   2. Users can see what the snapshot ID was and find out which jobs read the bad snapshot
   3. Users can see what other changes were rolled back at the same time (e.g. a compaction was also rolled back)
   
   If we want to support this use case, then I think we need to make an API that will roll back a table to some point in time. That would roll back the snapshot (preserving the snapshot log) and revert metadata changes. We could do this by having a rollback API that actually uses a transaction to make multiple different changes. I've been thinking about updating the `SnapshotManager` to use a transaction as well, which would allow cherry-picking multiple commits as a single operation.
   
   There's a big more discussion that should happen here, but I'm open to the transaction approach. I just don't think rolling back the metadata file instead of moving forward and keeping history is a good idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org