You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "aokolnychyi (via GitHub)" <gi...@apache.org> on 2023/05/02 20:50:02 UTC

[GitHub] [iceberg] aokolnychyi opened a new pull request, #7501: API, Core: Make RewriteFiles flexible

aokolnychyi opened a new pull request, #7501:
URL: https://github.com/apache/iceberg/pull/7501

   This PR makes `RewriteFiles` more flexible and aligned with other APIs like `OverwriteFiles`. One valid use case is discussed in #7389, where we want to replace one set of delete files with another one. Instead of adding another overloaded method, we should make the interface flexible enough to support various combinations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1185611172


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.

Review Comment:
   Does anyone else have any preferences?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183795251


##########
core/src/test/java/org/apache/iceberg/TestRewriteFiles.java:
##########
@@ -86,7 +86,7 @@ public void testEmptyTable() {
                     .rewriteFiles(
                         ImmutableSet.of(),
                         ImmutableSet.of(FILE_A_DELETES),
-                        ImmutableSet.of(FILE_A),

Review Comment:
   Exactly, this is testing an error message so it was, probably, overlooked.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183099131


##########
core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java:
##########
@@ -26,6 +26,9 @@
 class BaseRewriteFiles extends MergingSnapshotProducer<RewriteFiles> implements RewriteFiles {
   private final Set<DataFile> replacedDataFiles = Sets.newHashSet();
   private Long startingSnapshotId = null;
+  private boolean replacesDeleteFiles = false;

Review Comment:
   rather than fields, can we check whether "repalcesDataFiles" or other data structures have contents? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183801342


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,72 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Delete a delete file whose content was rewritten.
+   *
+   * @param deleteFile a rewritten delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DeleteFile deleteFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Add a new data file.
+   *
+   * @param dataFile a new data file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement addFile");
+  }
+
+  /**
+   * Add a new delete file.
+   *
+   * @param deleteFile a new delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DeleteFile deleteFile) {

Review Comment:
   That's a good point, I'll add. That said, neither `DataFile` nor `DeleteFile` implementations provide proper equality at the moment so `Set` was kind of useless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#issuecomment-1538931718

   I double checked a few places that use this API as well as the underlying implementation methods. We do use `delete` in most places, so I'll go ahead and merge this PR. If there are any comments later, we can still change this before the release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183100507


##########
core/src/test/java/org/apache/iceberg/TestRewriteFiles.java:
##########
@@ -86,7 +86,7 @@ public void testEmptyTable() {
                     .rewriteFiles(
                         ImmutableSet.of(),
                         ImmutableSet.of(FILE_A_DELETES),
-                        ImmutableSet.of(FILE_A),

Review Comment:
   This would be removing a delete file and adding a data file and a new delete file? Seems impossible in rewrite



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183809676


##########
core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java:
##########
@@ -26,6 +26,9 @@
 class BaseRewriteFiles extends MergingSnapshotProducer<RewriteFiles> implements RewriteFiles {
   private final Set<DataFile> replacedDataFiles = Sets.newHashSet();
   private Long startingSnapshotId = null;
+  private boolean replacesDeleteFiles = false;

Review Comment:
   Do you mean checking the underlying `ManifestFilterManager`, `List<DataFile>` and the map of delete files? I thought about this but was initially reluctant to touch `ManifestFilterManager` (4 different ways to delete files).
   
   That said, these boolean flags are a bit odd. Now I am thinking it makes sense to spend a bit more time and check the underlying structures. @RussellSpitzer, is this what you meant?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183876608


##########
core/src/main/java/org/apache/iceberg/BaseRewriteFiles.java:
##########
@@ -26,6 +26,9 @@
 class BaseRewriteFiles extends MergingSnapshotProducer<RewriteFiles> implements RewriteFiles {
   private final Set<DataFile> replacedDataFiles = Sets.newHashSet();
   private Long startingSnapshotId = null;
+  private boolean replacesDeleteFiles = false;

Review Comment:
   Yep, sorry if I was unclear. It's just weird to keep this as state flags when they are just proxy variables for the actual state of containers we have access to.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183799559


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,72 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Delete a delete file whose content was rewritten.

Review Comment:
   Do you mean cases when we rewrite a position delete file but discard no longer valid deletes?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183877645


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,72 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Delete a delete file whose content was rewritten.
+   *
+   * @param deleteFile a rewritten delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DeleteFile deleteFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Add a new data file.
+   *
+   * @param dataFile a new data file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement addFile");
+  }
+
+  /**
+   * Add a new delete file.
+   *
+   * @param deleteFile a new delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DeleteFile deleteFile) {

Review Comment:
   True, Sounds like we can drop it, although may want to fix that in the future since we don't actually support multiple instances of the same file within a table.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183039423


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.

Review Comment:
   Using the word `delete` is a bit questionable in this context but that's what the Javadoc says. It is also similar to the naming used in `OverwriteFiles` and other APIs. I can see this being called `rewriteFile` or something too.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#issuecomment-1533209092

   CI failed because of the API checks. I'll fix them together with addressing the feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183096753


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,72 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Delete a delete file whose content was rewritten.
+   *
+   * @param deleteFile a rewritten delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DeleteFile deleteFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Add a new data file.
+   *
+   * @param dataFile a new data file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement addFile");
+  }
+
+  /**
+   * Add a new delete file.
+   *
+   * @param deleteFile a new delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DeleteFile deleteFile) {

Review Comment:
   Do we want to specify that adding the same delete file/data file twice will have no effect? We required set's before



##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,72 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Delete a delete file whose content was rewritten.
+   *
+   * @param deleteFile a rewritten delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DeleteFile deleteFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Add a new data file.
+   *
+   * @param dataFile a new data file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement addFile");
+  }
+
+  /**
+   * Add a new delete file.
+   *
+   * @param deleteFile a new delete file
+   * @return this for method chaining
+   */
+  default RewriteFiles addFile(DeleteFile deleteFile) {

Review Comment:
   Do we want to specify that adding the same delete file/data file twice will have no effect? We required set
   s before



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#issuecomment-1532140642

   cc @nastra @jackye1995 @amogh-jahagirdar @szehon-ho @RussellSpitzer @rdblue @flyrain 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183041878


##########
core/src/test/java/org/apache/iceberg/TestRewriteFiles.java:
##########
@@ -86,7 +86,7 @@ public void testEmptyTable() {
                     .rewriteFiles(
                         ImmutableSet.of(),
                         ImmutableSet.of(FILE_A_DELETES),
-                        ImmutableSet.of(FILE_A),

Review Comment:
   Was it ever a valid use case?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183794060


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.

Review Comment:
   Yeah, it is unfortunate we overlooked the distinction between physical `delete` and logical `remove`. Let me double check other APIs to see whether there are also any good examples. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1184004938


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,72 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Delete a delete file whose content was rewritten.

Review Comment:
   Yea the way I read it, it may indicate that all the content will be rewritten.  But its likely in my case that it is just dropped without being rewritten.  Not sure if its just worth calling out here 'whose content was rewritten'?  Maybe we can do the other way around, and add the javadoc on addFile():  'Add a DeleteFile rewriting content of a deleted DeleteFile'.  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1186471821


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.

Review Comment:
   @jackye1995 @rdblue, any thoughts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183040379


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  RewriteFiles deleteFile(DataFile dataFile);
+
+  /**
+   * Delete a delete file whose content was rewritten.
+   *
+   * @param deleteFile a rewritten delete file
+   * @return this for method chaining
+   */
+  RewriteFiles deleteFile(DeleteFile deleteFile);
+
+  /**
+   * Add a new data file.
+   *
+   * @param dataFile a new data file
+   * @return this for method chaining
+   */
+  RewriteFiles addFile(DataFile dataFile);
+
+  /**
+   * Add a new delete file.
+   *
+   * @param deleteFile a new delete file
+   * @return this for method chaining
+   */
+  RewriteFiles addFile(DeleteFile deleteFile);
+
+  /**
+   * Configure the data sequence number for this rewrite operation. This data sequence number will
+   * be used for all new data files that are added in this rewrite. This method is helpful to avoid
+   * commit conflicts between data compaction and adding equality deletes.
+   *
+   * @param sequenceNumber a data sequence number
+   * @return this for method chaining
+   */
+  RewriteFiles dataSequenceNumber(long sequenceNumber);

Review Comment:
   We can only set a data sequence number for data files. Whenever we rewrite delete files, we have to rely on sequence numbers of source files. I can see this being called `newDataFilesDataSequenceNumber` but that's long.



##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  RewriteFiles deleteFile(DataFile dataFile);
+
+  /**
+   * Delete a delete file whose content was rewritten.
+   *
+   * @param deleteFile a rewritten delete file
+   * @return this for method chaining
+   */
+  RewriteFiles deleteFile(DeleteFile deleteFile);
+
+  /**
+   * Add a new data file.
+   *
+   * @param dataFile a new data file
+   * @return this for method chaining
+   */
+  RewriteFiles addFile(DataFile dataFile);
+
+  /**
+   * Add a new delete file.
+   *
+   * @param deleteFile a new delete file
+   * @return this for method chaining
+   */
+  RewriteFiles addFile(DeleteFile deleteFile);
+
+  /**
+   * Configure the data sequence number for this rewrite operation. This data sequence number will
+   * be used for all new data files that are added in this rewrite. This method is helpful to avoid
+   * commit conflicts between data compaction and adding equality deletes.
+   *
+   * @param sequenceNumber a data sequence number
+   * @return this for method chaining
+   */
+  RewriteFiles dataSequenceNumber(long sequenceNumber);

Review Comment:
   We can only set a data sequence number for data files. Whenever we rewrite delete files, we have to rely on sequence numbers of source files. I can see this being called `newDataFilesDataSequenceNumber` but that's too long.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183040379


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  RewriteFiles deleteFile(DataFile dataFile);
+
+  /**
+   * Delete a delete file whose content was rewritten.
+   *
+   * @param deleteFile a rewritten delete file
+   * @return this for method chaining
+   */
+  RewriteFiles deleteFile(DeleteFile deleteFile);
+
+  /**
+   * Add a new data file.
+   *
+   * @param dataFile a new data file
+   * @return this for method chaining
+   */
+  RewriteFiles addFile(DataFile dataFile);
+
+  /**
+   * Add a new delete file.
+   *
+   * @param deleteFile a new delete file
+   * @return this for method chaining
+   */
+  RewriteFiles addFile(DeleteFile deleteFile);
+
+  /**
+   * Configure the data sequence number for this rewrite operation. This data sequence number will
+   * be used for all new data files that are added in this rewrite. This method is helpful to avoid
+   * commit conflicts between data compaction and adding equality deletes.
+   *
+   * @param sequenceNumber a data sequence number
+   * @return this for method chaining
+   */
+  RewriteFiles dataSequenceNumber(long sequenceNumber);

Review Comment:
   We can only set a data sequence number for data files. Whenever we rewrite delete files, we have to rely on sequence numbers of source files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183078559


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,57 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.

Review Comment:
   Our terminology always throw me a bit off here. We struggle with "delete" = physically remove and "delete" mark as logically removed. Both are used in various places and sometimes we also use "remove".
   
   Here I think "originalFile" maybe is more of the logical description, "original" => "rewritten" instead of "add" and "delete" which are really the implementation details. 
   
   That said i'm fine with just "add" and "delete" since we use them in many other places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a diff in pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho commented on code in PR #7501:
URL: https://github.com/apache/iceberg/pull/7501#discussion_r1183088043


##########
api/src/main/java/org/apache/iceberg/RewriteFiles.java:
##########
@@ -34,13 +34,72 @@
  * will throw a {@link ValidationException}.
  */
 public interface RewriteFiles extends SnapshotUpdate<RewriteFiles> {
+  /**
+   * Delete a data file whose content was rewritten.
+   *
+   * @param dataFile a rewritten data file
+   * @return this for method chaining
+   */
+  default RewriteFiles deleteFile(DataFile dataFile) {
+    throw new UnsupportedOperationException(
+        this.getClass().getName() + " does not implement deleteFile");
+  }
+
+  /**
+   * Delete a delete file whose content was rewritten.

Review Comment:
   Question: will this be not entirely accurate for RewritePositionDelete?  We may end up not rewriting invalid deletes.  Or is 'rewritten' more nuanced?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #7501: API, Core: Make RewriteFiles flexible

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi merged PR #7501:
URL: https://github.com/apache/iceberg/pull/7501


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org