You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/04 20:39:32 UTC

[GitHub] [iceberg] alexjo2144 opened a new issue, #5442: Transaction with multiple statements clear table history

alexjo2144 opened a new issue, #5442:
URL: https://github.com/apache/iceberg/issues/5442

   ### Apache Iceberg version
   
   0.14.0 (latest release)
   
   ### Query engine
   
   _No response_
   
   ### Please describe the bug 🐞
   
   If a Transaction has multiple changes that it applies the table history is cleared and `TableManifest.snapshotLog` is cleared.
   
   Here's a test case, adopted from `TestTransaction#testMultipleOperationTransaction`
   
   ```java
     @Test
     public void testMultipleOperationTransaction() {
       Assert.assertEquals("Table should be on version 0", 0, (int) version());
   
       table.newAppend().appendFile(FILE_C).commit();
       List<HistoryEntry> initialHistory = table.history();
   
       TableMetadata base = readMetadata();
   
       Transaction txn = table.newTransaction();
   
       Assert.assertSame("Base metadata should not change when commit is created",
           base, readMetadata());
       Assert.assertEquals("Table should be on version 1 after txn create", 1, (int) version());
   
       txn.newAppend()
           .appendFile(FILE_A)
           .appendFile(FILE_B)
           .commit();
   
       Assert.assertSame("Base metadata should not change when commit is created",
           base, readMetadata());
       Assert.assertEquals("Table should be on version 1 after txn create", 1, (int) version());
   
       Snapshot appendSnapshot = txn.table().currentSnapshot();
   
       txn.newDelete()
           .deleteFile(FILE_A)
           .commit();
   
       Snapshot deleteSnapshot = txn.table().currentSnapshot();
   
       Assert.assertSame("Base metadata should not change when an append is committed",
           base, readMetadata());
       Assert.assertEquals("Table should be on version 1 after append", 1, (int) version());
   
       txn.commitTransaction();
   
       Assert.assertEquals("Table should be on version 2 after commit", 2, (int) version());
       Assert.assertEquals("Table should have two manifest after commit",
           2, readMetadata().currentSnapshot().allManifests(table.io()).size());
       Assert.assertEquals("Table snapshot should be the delete snapshot",
           deleteSnapshot, readMetadata().currentSnapshot());
       validateManifestEntries(readMetadata().currentSnapshot().allManifests(table.io()).get(0),
           ids(deleteSnapshot.snapshotId(), appendSnapshot.snapshotId()),
           files(FILE_A, FILE_B), statuses(Status.DELETED, Status.EXISTING));
   
       Assert.assertEquals("Table should have a snapshot for each operation",
           3, readMetadata().snapshots().size());
       validateManifestEntries(readMetadata().snapshots().get(1).allManifests(table.io()).get(0),
           ids(appendSnapshot.snapshotId(), appendSnapshot.snapshotId()),
           files(FILE_A, FILE_B), statuses(Status.ADDED, Status.ADDED));
   
       List<HistoryEntry> finalHistory = table.history();
       for (HistoryEntry historyEntry : initialHistory) {
         // Fails
         Assert.assertTrue(finalHistory.contains(historyEntry));
       }
     }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexjo2144 commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
alexjo2144 commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1211979113

   @rdblue or @danielcweeks either of you have any thoughts on this one? We reverted the changes in Trino that hit this case, but it would be nice to be able to put them back at some point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1206814006

   Oh this seems intentional based on the comment above `newSnapshotLog.clear()` below.
   
   Though there's possibly a better way of handling the intermediate history states that never actually existed vs simply clearing the whole history log.
   
   I think the `snapshots` metadata table would still have all the snapshot data, but ideally just the entries that need to be trimmed from the history log are removed.
   
   https://github.com/apache/iceberg/blob/353c53910195b81bd5ceb908c12ed8113bbb4e3b/core/src/main/java/org/apache/iceberg/TableMetadata.java#L1558-L1593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1217100515

   I'm planning on bringing this up with Ryan tomorrow in a meeting. Thanks guys for reporting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
rdblue closed issue #5442: Transaction with multiple statements clear table history
URL: https://github.com/apache/iceberg/issues/5442


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1219730621

   @alexjo2144 @findepi This has been reviewed and a solution seems to be coming: https://github.com/apache/iceberg/pull/5568/files#r949392983


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1219845923

   Fixed! Thanks for reporting this, @alexjo2144. I've also marked this to be included in the next patch release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] alexjo2144 commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
alexjo2144 commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1219939362

   Thank you both!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1206803875

   Does this still hapoen if the table is refreshed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on issue #5442: Transaction with multiple statements clear table history

Posted by GitBox <gi...@apache.org>.
kbendick commented on issue #5442:
URL: https://github.com/apache/iceberg/issues/5442#issuecomment-1206851665

   I tried this with `refresh` and it doesn't make a difference.
   
   I think the logic for `intermediateSnapshotIdSet` needs to possibly be updated, but I'm still looking through the comment about time-travel queries.
   
   If anybody else has any ideas, feel free to chime in 😅 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org