You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/18 22:45:55 UTC

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #4703: API: Optionally ignore position deletes in rewrite validation

aokolnychyi commented on code in PR #4703:
URL: https://github.com/apache/iceberg/pull/4703#discussion_r876439278


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -84,4 +84,12 @@ RewriteFiles rewriteFiles(Set<DataFile> dataFilesToReplace, Set<DeleteFile> dele
    * @return this for method chaining
    */
   RewriteFiles validateFromSnapshot(long snapshotId);
+
+  /**
+   * Ignore the position deletes in rewrite validation. Flink upsert job only generates position deletes in the
+   * ongoing transaction, so it is not necessary to validate position deletes when rewriting.
+   *
+   * @return this for method chaining
+   */
+  RewriteFiles ignorePosDeletesInValidation();

Review Comment:
   I think this method disables position delete file validation altogether that may cause correctness issues if multiple engines operate on the same table. If I understand correctly, our goal is not to fail rewrites of data files where there is a matching position delete file added in the same snapshot (i.e. Flink upsert), assuming that position delete file was applied when the data file was compacted. What if Spark added a separate position delete file for the data file produced by Flink? We can't commit the rewrite, can we?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org