You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "aokolnychyi (via GitHub)" <gi...@apache.org> on 2023/04/25 16:16:18 UTC

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7389: Spark 3.4: Implement rewrite position deletes

aokolnychyi commented on code in PR #7389:
URL: https://github.com/apache/iceberg/pull/7389#discussion_r1176750577


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -57,6 +57,16 @@ default RewriteFiles rewriteFiles(Set<DataFile> filesToDelete, Set<DataFile> fil
   RewriteFiles rewriteFiles(
       Set<DataFile> filesToDelete, Set<DataFile> filesToAdd, long sequenceNumber);
 
+  /**
+   * Add a rewrite that replaces one set of delete files with another set that contains the same
+   * data.
+   *
+   * @param filesToDelete files that will be replaced (deleted), cannot be null or empty.
+   * @param filesToAdd files that will be added, cannot be null or empty.
+   * @return this for method chaining
+   */
+  RewriteFiles rewriteDeleteFiles(Set<DeleteFile> filesToDelete, Set<DeleteFile> filesToAdd);

Review Comment:
   Probably, there is a problem in `RewriteFiles` right now. I think this API would assign new delete files a brand new data sequence number while we should use the max data sequence number of all rewritten position deletes.
   
   On a side note, I am not sure we can ever rewrite equality deletes across sequence numbers. Let me think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org