You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/09/29 08:08:31 UTC

[GitHub] [iceberg] hameizi opened a new pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

hameizi opened a new pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204


   This PR is handle the case that RewriteFiles and RowDelta commit the transaction at the same time what will cause duplicate record in table.
   The approach is just like what discuss in https://github.com/apache/iceberg/issues/2308#issuecomment-801576874


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-939655146


   @rdblue In the newest commit i add one configuration to config whether validate there is new delete file is added when compacting file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-929949544


   @yyanyy @stevenzwu @rdblue Can you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi removed a comment on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi removed a comment on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-939650574


   @rdblue I commit one new commit for your suggestion, can you help review one more time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-951054059


   I've been thinking about this case and I think that the right way to do this is to set the sequence number on individual files rather than at the snapshot level. I don't think that we should change the sequence number of the snapshot or manifest list. We should just set the sequence number of individual data files. Basically, I agree with @yyanyy's [comment](https://github.com/apache/iceberg/issues/2308#issuecomment-801330831):
   
   > I think there are two seqNum concepts here: seqNum for the table/commit and seqNum for the file. I think it's a reasonable approach to mark the rewritten files with the old seqNum, but I'm not sure if we necessarily need to use an old sequence number for the commit since they are stored as part of the snapshot, and suddenly have an old seqNum within the snapshot could be confusing even if there's no other implication.
   
   I think we need to set the file sequence numbers. That raises the question: what do we set them to? Ideally, we would use the latest, but the delete file commit has claimed that number so we need to go with the sequence number that is less than any other commits. That would be the sequence number that was current when rewrite operation started. It would be nice to find a way around reusing a sequence number from a different snapshot, but I don't see a good way to do that right now. We can possibly fix that up later by skipping sequence numbers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-962701123


   @hameizi, can you help review and test #3480? That's an alternative approach to what you're doing here that sets the sequence number per data file. I think that change is actually really important. While I was reviewing this, I thought that it was probably not a good idea to set the sequence number by reusing inheritance. Now that we've thought through the use case more, we've come up with a good reason not to: it makes it so we can't recover the sequence number where files were added, not just the sequence number where the data lives in time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi edited a comment on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi edited a comment on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-951500192


   @rdblue I think import file sequence numbers will cause unnecessary complexity, and the sequence number of manifest list can mean file commit sequence, the data file sequence number just describe different sequence of snapshot and data file, but manifest list sequence numbers can do this too, because there is snapshotid in manifest list file ,so wo don't need hold same sequence numbers for one snapshot in the snapshot and manifest list. So even if we import file sequence numbers it just cause the same effect as the sequence number of manifest list. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-951500192


   @rdblue I think import file sequence numbers will cause unnecessary complexity, and the sequence number of manifest list can mean file commit sequence, because there is snapshotid in manifest list file ,so wo don't need hold same sequence numbers for one snapshot in the snapshot and manifest list. So even if we import file sequence numbers it just cause the same effect as the sequence number of manifest list. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-939736573


   @rdblue I commit one new commit for your suggestion, can you help review one more time? The fix detail as follows:
   1.add one configuration to config whether vaildate there is new delete files be added when compacting files.
   2.change name sequenceNumber to sequenceNumberOverride
   3.update documentation  for method setSequenceNumber
   
   > With this approach, I think we need a validation that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.
   
   The override sequence number is init when compact action generate fileScanTasks in https://github.com/apache/iceberg/pull/3204/files#:~:text=long%20sequenceNumber%20%3D%20table.currentSnapshot().sequenceNumber()%3B, so it can guarantee that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-946861520


   Running CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-930697655


   > is this a duplicate of #3069 ?
   
   @jackye1995 it's not, this PR is focus on handle duplicate record when there is some delete file be commit during rewrite action. And i update description of this PR that describe how this PR handle this case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-945531745


   @rdblue @openinx Could you help take a look again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719756194



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -133,6 +133,15 @@ public ThisT deleteWith(Consumer<String> deleteCallback) {
    */
   protected abstract String operation();
 
+  /**
+   * A Long that write sequenceNumber in manifest-list file.
+   *
+   * @return a string operation
+   */
+  protected Long sequenceNumber() {

Review comment:
       I think this needs a more specific name, like `sequenceNumberOverride`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r725834055



##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java
##########
@@ -44,6 +45,11 @@ protected String operation() {
     return DataOperations.REPLACE;
   }
 
+  @Override
+  protected Long sequenceNumber() {
+    return replaceSequenceNumber;

Review comment:
       The override sequence number is init when compact action generate fileScanTasks in https://github.com/apache/iceberg/pull/3204/files#:~:text=long%20sequenceNumber%20%3D%20table.currentSnapshot().sequenceNumber()%3B, so it can guarantee that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-929940931


   @openinx Can you help take a look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719755694



##########
File path: core/src/main/java/org/apache/iceberg/SnapshotProducer.java
##########
@@ -167,7 +176,8 @@ public Snapshot apply() {
       OutputFile manifestList = manifestListPath();
 
       try (ManifestListWriter writer = ManifestLists.write(
-          ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId, sequenceNumber)) {
+          ops.current().formatVersion(), manifestList, snapshotId(), parentSnapshotId,
+          operation().equals(DataOperations.REPLACE) && sequenceNumber() != null ? sequenceNumber() : sequenceNumber)) {

Review comment:
       I think this is a clever way to fix the problem. What we had considered before is setting the sequence number of individual files rather than the sequence number used for the manifest list. This is an easier way to set the sequence number for all new data files in the snapshot.
   
   I need to think about whether this is the right approach a bit more. As long as we have a static sequence number, using inheritance is only a convenience. We could set that static sequence number on the individual data or delete files that we add in the commit. I tend to lean toward that solution because it minimizes the places that use the overridden sequence number -- so we know that the manifest list and manifests all have the latest sequence number that is assigned to the snapshot rather than also being re-sequenced.
   
   I'll think about the trade-off some more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue closed pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue closed pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719735095



##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java
##########
@@ -97,11 +103,13 @@ public RewriteFiles validateFromSnapshot(long snapshotId) {
     return this;
   }
 
+  @Override
+  public RewriteFiles setSequenceNumber(long sequenceNumber) {
+    this.replaceSequenceNumber = sequenceNumber;
+    return this;
+  }
+
   @Override
   protected void validate(TableMetadata base) {
-    if (replacedDataFiles.size() > 0) {
-      // if there are replaced data files, there cannot be any new row-level deletes for those data files
-      validateNoNewDeletesForDataFiles(base, startingSnapshotId, replacedDataFiles);
-    }

Review comment:
       This should not be removed. There is a valid use case for the operation to check whether there are conflicts instead of re-sequencing data files. If you want, you can add a configuration method to enable/disable the validation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-939650574


   @rdblue I commit one new commit for your suggestion, can you help review one more time?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719735434



##########
File path: api/src/main/java/org/apache/iceberg/RewriteFiles.java
##########
@@ -73,4 +73,12 @@ RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace, Set<DeleteFile> dele
    * @return this for method chaining
    */
   RewriteFiles validateFromSnapshot(long snapshotId);
+
+  /**
+   * Set the sequenceNumber write in manifest-list file.
+   *
+   * @param sequenceNumber a sequenceNumber
+   * @return this for method chaining
+   */
+  RewriteFiles setSequenceNumber(long sequenceNumber);

Review comment:
       I think we need better documentation for what this method does and when you would call it, since this affects correctness.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-950493738


   @rdblue Hello, is there any progress ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719758975



##########
File path: core/src/test/java/org/apache/iceberg/TestRewriteFiles.java
##########
@@ -612,33 +612,4 @@ public void testAlreadyDeletedFile() {
     Assert.assertEquals("Only 3 manifests should exist", 3, listManifestFiles().size());
   }
 
-  @Test
-  public void testNewDeleteFile() {

Review comment:
       I see that this is related to removing the validation in `BaseRewriteFiles`. Like I noted there, we want to keep that validation and use it in certain cases.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719757359



##########
File path: core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java
##########
@@ -44,6 +45,11 @@ protected String operation() {
     return DataOperations.REPLACE;
   }
 
+  @Override
+  protected Long sequenceNumber() {
+    return replaceSequenceNumber;

Review comment:
       With this approach, I think we need a validation that none of the data or delete files that are being replaced have sequence numbers newer than the override sequence number.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#discussion_r719728441



##########
File path: core/src/test/java/org/apache/iceberg/TestRewriteFiles.java
##########
@@ -612,33 +612,4 @@ public void testAlreadyDeletedFile() {
     Assert.assertEquals("Only 3 manifests should exist", 3, listManifestFiles().size());
   }
 
-  @Test
-  public void testNewDeleteFile() {

Review comment:
       Why are you removing tests? This is probably incorrect.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-930655467


   is this a duplicate of #3069  ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi removed a comment on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi removed a comment on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-939655146


   @rdblue In the newest commit i add one configuration to config whether validate there is new delete file is added when compacting file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] hameizi commented on pull request #3204: Handle the case that RewriteFiles and RowDelta commit the transaction…

Posted by GitBox <gi...@apache.org>.
hameizi commented on pull request #3204:
URL: https://github.com/apache/iceberg/pull/3204#issuecomment-929940931






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org