You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/16 00:15:51 UTC

[GitHub] [iceberg] sririshindra opened a new pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

sririshindra opened a new pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135


   While testing rollback_to_snaphot feature, we found that this is feature is failing in some cases.
   
   In the setSetCurrentSnapshot method '[null](https://github.com/apache/iceberg/blob/apache-iceberg-0.13.0/core/src/main/java/org/apache/iceberg/TableMetadata.java#L924)' is being passed for timestamp which eventually causes the failure of the following [precondition check](https://github.com/apache/iceberg/blob/apache-iceberg-0.13.0/core/src/main/java/org/apache/iceberg/TableMetadata.java#L306).  The timeStamp should also be passed along with the Snapshot just like [here](https://github.com/apache/iceberg/blob/apache-iceberg-0.13.0/core/src/main/java/org/apache/iceberg/TableMetadata.java#L938 )
   
   This PR fixes this bug in 0.13.x. I tested this with our internal tests and also by running the unit tests in Iceberg. This issue no longer exists in master because of [4089](https://github.com/apache/iceberg/pull/4089). The current fix is ported from [here](https://github.com/apache/iceberg/pull/4089#discussion_r806331672)
   
   Steps to reproduce the rollback_to_snapshot bug in spark.
   
   Each insert operation below is done separately in its own spark app. 
     
   
   
       sql("INSERT INTO %s VALUES (3, 'zzz', 1)" % table1)    
       sql("INSERT INTO %s VALUES (6, 'www', 1)" % table1)
       sql("INSERT INTO %s VALUES (9, 'zyz', 1)" % table1)
       sql("INSERT INTO %s VALUES (2, 'yy', 2)" % table1)
       sql("INSERT INTO %s VALUES (4, 'xx', 2)" % table1)
       sql("INSERT INTO %s VALUES (1, 'x', 3)" % table1)`
       
      # launch a new spark shell and follow the following steps
       results, _ = sql("select id, data, id2 from %s order by id2, id" % table1) # some internal code to get the results.
       results, _ = sql("select snapshot_id from %s.%s.snapshots" % (DATABASE, table1))
       snapshots = self.get_rows(results) # some internal code to get snapshots in a python list.
       sql("CALL spark_catalog.system.rollback_to_snapshot('%s', %s)" % (table1, str(snapshots[2][0])))`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a change in pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
wypoon commented on a change in pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#discussion_r809449375



##########
File path: core/src/main/java/org/apache/iceberg/TableMetadata.java
##########
@@ -501,6 +501,10 @@ public TableMetadata replaceCurrentSnapshot(Snapshot snapshot) {
     return new Builder(this).setCurrentSnapshot(snapshot).build();
   }
 
+  public TableMetadata replaceCurrentSnapshot(long snapshotId) {
+    return new Builder(this).setCurrentSnapshot(snapshotId).build();
+  }
+

Review comment:
       I filed #4155 to remove the unused methods.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#issuecomment-1043185981


   Looks good! I'll merge when tests are passing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] sririshindra commented on pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
sririshindra commented on pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#issuecomment-1043240608


   > Looks good! I'll merge when tests are passing.
   
   Thanks @rdblue 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#discussion_r809446985



##########
File path: core/src/main/java/org/apache/iceberg/TableMetadata.java
##########
@@ -501,6 +501,10 @@ public TableMetadata replaceCurrentSnapshot(Snapshot snapshot) {
     return new Builder(this).setCurrentSnapshot(snapshot).build();
   }
 
+  public TableMetadata replaceCurrentSnapshot(long snapshotId) {
+    return new Builder(this).setCurrentSnapshot(snapshotId).build();
+  }
+

Review comment:
       Yes, that's right. I missed this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a change in pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
wypoon commented on a change in pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#discussion_r807437913



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -283,7 +283,10 @@ public void commit() {
             Snapshot newSnapshot = apply();
             newSnapshotId.set(newSnapshot.snapshotId());
             TableMetadata updated;
-            if (stageOnly) {
+            if (base.snapshot(newSnapshot.snapshotId()) != null) {
+              // this is a rollback operation
+              updated = base.replaceCurrentSnapshot(newSnapshot.snapshotId());
+            } else if (stageOnly) {

Review comment:
       @rdblue, would the following be acceptable? (from line 285)
   ```
               TableMetadata.Builder update = TableMetadata.buildFrom(base);
               if (base.snapshot(newSnapshot.snapshotId()) != null) {
                 // this is a rollback or cherrypick operation
                 update.setCurrentSnapshot(newSnapshot.snapshotId());
                 } else if (stageOnly) {
                 update.addSnapshot(newSnapshot);
               } else {
                 update.setCurrentSnapshot(newSnapshot);
               }
   
               TableMetadata updated = update.build();
               if (updated.changes().isEmpty()) {
                 // do not commit if the metadata has not changed. for example, this may happen when setting the current
                 // snapshot to an ID that is already current. note that this check uses identity.
                 return;
               }
   ```
   I know the point of #4089 is to replace `setCurrentSnapshot` with `setBranchSnapshot`, but in 0.13.x, we don't support branching yet, and the commits #4089 builds on are not present. The above mirrors your change to `SnapshotProducer` in #4089.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] sririshindra commented on pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
sririshindra commented on pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#issuecomment-1040931972


   @rdblue As you suggested in https://github.com/apache/iceberg/pull/4088 I opened this PR on top of 0.13.x branch to address the rollback issue. Please review it at your convenience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a change in pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
wypoon commented on a change in pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#discussion_r807437913



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -283,7 +283,10 @@ public void commit() {
             Snapshot newSnapshot = apply();
             newSnapshotId.set(newSnapshot.snapshotId());
             TableMetadata updated;
-            if (stageOnly) {
+            if (base.snapshot(newSnapshot.snapshotId()) != null) {
+              // this is a rollback operation
+              updated = base.replaceCurrentSnapshot(newSnapshot.snapshotId());
+            } else if (stageOnly) {

Review comment:
       @rdblue, would the following be acceptable? (from line 285)
   ```
               TableMetadata.Builder update = TableMetadata.buildFrom(base);
               if (base.snapshot(newSnapshot.snapshotId()) != null) {
                 // this is a rollback or cherrypick operation
                 update.setCurrentSnapshot(newSnapshot.snapshotId());
                 } else if (stageOnly) {
                 update.addSnapshot(newSnapshot);
               } else {
                 update.setCurrentSnapshot(newSnapshot);
               }
   
               TableMetadata updated = update.build();
               if (updated.changes().isEmpty()) {
                 // do not commit if the metadata has not changed. for example, this may happen when setting the current
                 // snapshot to an ID that is already current. note that this check uses identity.
                 return;
               }
   ```
   I know the point of #4089 is to replace `setCurrentSnapshot` with `setBranchSnapshot`, but in 0.13.x, we don't support branching yet, and the commits #4089 builds on are not present.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#discussion_r807429704



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -283,7 +283,10 @@ public void commit() {
             Snapshot newSnapshot = apply();
             newSnapshotId.set(newSnapshot.snapshotId());
             TableMetadata updated;
-            if (stageOnly) {
+            if (base.snapshot(newSnapshot.snapshotId()) != null) {
+              // this is a rollback operation
+              updated = base.replaceCurrentSnapshot(newSnapshot.snapshotId());
+            } else if (stageOnly) {

Review comment:
       Can you port the changes from the other fix? That removed methods from `TableMetadata` rather than adding new ones. I think it would be best to keep 0.13.x and master in sync rather than having slightly different fixes.
   
   Thanks, @sririshindra!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#discussion_r808511723



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -283,7 +283,10 @@ public void commit() {
             Snapshot newSnapshot = apply();
             newSnapshotId.set(newSnapshot.snapshotId());
             TableMetadata updated;
-            if (stageOnly) {
+            if (base.snapshot(newSnapshot.snapshotId()) != null) {
+              // this is a rollback operation
+              updated = base.replaceCurrentSnapshot(newSnapshot.snapshotId());
+            } else if (stageOnly) {

Review comment:
       Yes, calling the right `setCurrentSnapshot` is a good way to backport this. Thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a change in pull request #4135: Core: Call the replaceCurrentSnapshot with snapshotId for rollback operations

Posted by GitBox <gi...@apache.org>.
wypoon commented on a change in pull request #4135:
URL: https://github.com/apache/iceberg/pull/4135#discussion_r809435038



##########
File path: core/src/main/java/org/apache/iceberg/TableMetadata.java
##########
@@ -501,6 +501,10 @@ public TableMetadata replaceCurrentSnapshot(Snapshot snapshot) {
     return new Builder(this).setCurrentSnapshot(snapshot).build();
   }
 
+  public TableMetadata replaceCurrentSnapshot(long snapshotId) {
+    return new Builder(this).setCurrentSnapshot(snapshotId).build();
+  }
+

Review comment:
       I think this was an oversight, right? We don't need this new method; it is not used.
   In fact, with the above change to `SnapshotProducer`, even `replaceCurrentSnapshot(Snapshot)` is no longer used and can be removed, I think.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org