You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "gaborkaszab (via GitHub)" <gi...@apache.org> on 2023/05/08 12:16:32 UTC

[GitHub] [iceberg] gaborkaszab opened a new pull request, #7555: API, Core: Expose file and data sequence numbers through ContentFile

gaborkaszab opened a new pull request, #7555:
URL: https://github.com/apache/iceberg/pull/7555

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#issuecomment-1539292694

   I think this approach is correct, I had mostly minor comments.
   
   I also wonder whether we should prohibit passing new data files with a set explicit `fileSequenceNumber`, just like we don't allow new manifests with a snapshot ID. The file sequence number must be assigned at commit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1194486831


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -410,11 +410,19 @@ void validateSnapshot(Snapshot old, Snapshot snap, Long sequenceNumber, DataFile
       if (sequenceNumber != null) {
         V1Assert.assertEquals(
             "Data sequence number should default to 0", 0, entry.dataSequenceNumber().longValue());
+        V1Assert.assertEquals(
+            "Data sequence number should default to 0",

Review Comment:
   @gaborkaszab, you are correct that rewrites may preserve the initial data sequence number. This method would not work for such rewrites. I believe we have other tests that pass an iterator of sequence numbers that should be used in those cases. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#issuecomment-1538306947

   Resolves https://github.com/apache/iceberg/issues/7449


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188047485


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -562,6 +544,44 @@ void validateDeleteManifest(
     Assert.assertFalse("Should find all files in the manifest", expectedFiles.hasNext());
   }
 
+  private <T extends ContentFile<T>> void validateManifestSequenceNumbers(

Review Comment:
   This is a good way to test the core logic. Shall we also add tests at a higher level? For instance, it would be great to check methods for added/removed files in `Snapshot` behave correctly or that files returned via scans also have this information.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188125493


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -410,11 +410,19 @@ void validateSnapshot(Snapshot old, Snapshot snap, Long sequenceNumber, DataFile
       if (sequenceNumber != null) {
         V1Assert.assertEquals(
             "Data sequence number should default to 0", 0, entry.dataSequenceNumber().longValue());
+        V1Assert.assertEquals(
+            "Data sequence number should default to 0",

Review Comment:
   do we need assert on file sequence number too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188012934


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -76,6 +76,9 @@ public PartitionData copy() {
   private byte[] keyMetadata = null;
   private Integer sortOrderId;
 
+  private Long dataSequenceNumber = null;

Review Comment:
   Is there any particular reason to have these two as a separate block? My initial thought was to put them somewhere next to `partitionSpecId` above. Right now, `BaseFile` only have two blocks to keep optional fields separate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188016208


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -222,6 +227,24 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long dataSequenceNumber() {
+    return this.dataSequenceNumber;
+  }
+
+  public void setDataSequenceNumber(Long dataSequenceNumber) {
+    this.dataSequenceNumber = dataSequenceNumber;
+  }
+
+  @Override
+  public Long fileSequenceNumber() {
+    return this.fileSequenceNumber;

Review Comment:
   minor: Same as above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189841619


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */
+  default Long fileSequenceNumber() {
+    return null;

Review Comment:
   Thanks for pointing this out! I wasn't aware of these IndexedXXX classes. To be consistent with the existing logic, I added the new functions too similarly to specId.



##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -410,11 +410,19 @@ void validateSnapshot(Snapshot old, Snapshot snap, Long sequenceNumber, DataFile
       if (sequenceNumber != null) {
         V1Assert.assertEquals(
             "Data sequence number should default to 0", 0, entry.dataSequenceNumber().longValue());
+        V1Assert.assertEquals(
+            "Data sequence number should default to 0",

Review Comment:
   I added a check for file sequence number.
   
   However, I'm wondering if the checks for dataSequenceNumber are correct. If I'm not mistaken if there was a rewrite then the dataSequenceNumber on the ManifestEntry can differ from the one on the Snapshot. So these checks (even the existing ones) are correct only if there were no rewrites on the table. Is my understanding correct here?



##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -222,6 +227,24 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long dataSequenceNumber() {
+    return this.dataSequenceNumber;
+  }
+
+  public void setDataSequenceNumber(Long dataSequenceNumber) {
+    this.dataSequenceNumber = dataSequenceNumber;
+  }
+
+  @Override
+  public Long fileSequenceNumber() {
+    return this.fileSequenceNumber;

Review Comment:
   Done



##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */

Review Comment:
   Sorry, my bad. I sent some of my comments too early. I'm currently preparing the patch before pushing it to review. 



##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -222,6 +227,24 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long dataSequenceNumber() {
+    return this.dataSequenceNumber;

Review Comment:
   Indeed. Done



##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -562,6 +544,44 @@ void validateDeleteManifest(
     Assert.assertFalse("Should find all files in the manifest", expectedFiles.hasNext());
   }
 
+  private <T extends ContentFile<T>> void validateManifestSequenceNumbers(

Review Comment:
   Good point. I added some additional tests on Snapshot and Scan levels.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189776793


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */

Review Comment:
   Thanks, added.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1187739862


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {

Review Comment:
   based on the spec, we should call this `sequenceNumber` instead of `dataSequenceNumber`? Although I know `dataSequenceNumber` is probably more clear.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#issuecomment-1549954583

   Merging since we have a few approvals, and no blocking comments. thanks @gaborkaszab for the PR, and @rdblue @aokolnychyi @stevenzwu @jackye1995  for reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189582501


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {

Review Comment:
   Yeah, I saw that ManifestEntry.sequenceNumber() was removed, hence I used the same naming in ContentFile to be consistent with the current state of ManifestEntry.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#issuecomment-1549996259

   Thanks for the reviews, @amogh-jahagirdar @aokolnychyi @rdblue @stevenzwu @jackye1995 !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1194483795


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -478,6 +500,8 @@ public String toString() {
         .add("split_offsets", splitOffsets == null ? "null" : splitOffsets())
         .add("equality_ids", equalityIds == null ? "null" : equalityFieldIds())
         .add("sort_order_id", sortOrderId)
+        .add("data_sequence_number", dataSequenceNumber == null ? "null" : dataSequenceNumber)
+        .add("file_sequence_number", fileSequenceNumber == null ? "null" : fileSequenceNumber)

Review Comment:
   I had the same question but it looks like other vars in this class already use the same pattern. Worth checking, could probably be done in a separate PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar merged pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar merged PR #7555:
URL: https://github.com/apache/iceberg/pull/7555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188005697


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */

Review Comment:
   What about something like this?
   
   ```
   /**
    * Returns the data sequence number of the file.
    *
    * <p>This method represents the sequence number to which the file should apply. Note the data
    * sequence number may differ from the sequence number of the snapshot in which the underlying
    * file was added. New snapshots can add files that belong to older sequence numbers (e.g.
    * compaction). The data sequence number also does not change when the file is marked as deleted.
    *
    * <p>This method can return null if the data sequence number is unknown. This may happen while
    * reading a v2 manifest that did not persist the data sequence number for manifest entries with
    * status DELETED (older Iceberg versions).
    */
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188047485


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -562,6 +544,44 @@ void validateDeleteManifest(
     Assert.assertFalse("Should find all files in the manifest", expectedFiles.hasNext());
   }
 
+  private <T extends ContentFile<T>> void validateManifestSequenceNumbers(

Review Comment:
   This is a good way to test the core logic. Shall we also add tests at a higher level? For instance, it would be great to check methods for added/removed files in `Snapshot` behave correctly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188124618


##########
core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java:
##########
@@ -78,6 +73,13 @@ public <F extends ContentFile<F>> ManifestEntry<F> apply(ManifestEntry<F> manife
         manifestEntry.setFileSequenceNumber(sequenceNumber);

Review Comment:
   not caused by this PR. can we fix the typo in the comment? `iff` -> `off`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189241902


##########
core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java:
##########
@@ -78,6 +73,13 @@ public <F extends ContentFile<F>> ManifestEntry<F> apply(ManifestEntry<F> manife
         manifestEntry.setFileSequenceNumber(sequenceNumber);

Review Comment:
   I believe `iff` is correct and stands for `if and only if`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1191355643


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */

Review Comment:
   @gaborkaszab did you push the change? I am still seeing the old Javadoc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189292677


##########
core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java:
##########
@@ -78,6 +73,13 @@ public <F extends ContentFile<F>> ManifestEntry<F> apply(ManifestEntry<F> manife
         manifestEntry.setFileSequenceNumber(sequenceNumber);

Review Comment:
   "iff" here means "if and only if", I think. It's a common short-hand.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1195032379


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -76,6 +76,9 @@ public PartitionData copy() {
   private byte[] keyMetadata = null;
   private Integer sortOrderId;
 
+  private Long dataSequenceNumber = null;

Review Comment:
   Yeah, giving it a second thought, they indeed aren't optional. Added them to the same block as `partitionSpecId`. 



##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -562,6 +553,44 @@ void validateDeleteManifest(
     Assert.assertFalse("Should find all files in the manifest", expectedFiles.hasNext());
   }
 
+  private <T extends ContentFile<T>> void validateManifestSequenceNumbers(
+      ManifestEntry<T> entry, Iterator<Long> dataSeqs, Iterator<Long> fileSeqs) {
+    if (dataSeqs != null) {
+      V1Assert.assertEquals(
+          "Data sequence number should default to 0", 0, entry.dataSequenceNumber().longValue());
+      V1Assert.assertEquals(
+          "Data sequence number should default to 0",
+          0,
+          entry.file().dataSequenceNumber().longValue());
+
+      Long expectedSequenceNumber = dataSeqs.next();
+      V2Assert.assertEquals(
+          "Data sequence number should match expected",
+          expectedSequenceNumber,
+          entry.dataSequenceNumber());
+      V2Assert.assertEquals(
+          "Data sequence number should match expected",
+          expectedSequenceNumber,
+          entry.file().dataSequenceNumber());
+    }

Review Comment:
   Done



##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -478,6 +500,8 @@ public String toString() {
         .add("split_offsets", splitOffsets == null ? "null" : splitOffsets())
         .add("equality_ids", equalityIds == null ? "null" : equalityFieldIds())
         .add("sort_order_id", sortOrderId)
+        .add("data_sequence_number", dataSequenceNumber == null ? "null" : dataSequenceNumber)
+        .add("file_sequence_number", fileSequenceNumber == null ? "null" : fileSequenceNumber)

Review Comment:
   I kept consistency in mind with this code as a few other members above are also added the same way.



##########
core/src/test/java/org/apache/iceberg/TestSnapshot.java:
##########
@@ -170,4 +170,120 @@ public void testCachedDeleteFiles() {
     Assert.assertEquals(
         "Partition must match", thirdSnapshotDeleteFile.partition(), addedDeleteFile.partition());
   }
+
+  @Test
+  public void testSequenceNumbersInAddedDataFiles() {
+    long expectedSequenceNumber = 0L;
+    if (formatVersion >= 2) {
+      expectedSequenceNumber = 1L;
+    }

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188012934


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -76,6 +76,9 @@ public PartitionData copy() {
   private byte[] keyMetadata = null;
   private Integer sortOrderId;
 
+  private Long dataSequenceNumber = null;

Review Comment:
   Is there any particular reason to have these two as a separate block? My initial thought was to put it somewhere next to `partitionSpecId` above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188124618


##########
core/src/main/java/org/apache/iceberg/InheritableMetadataFactory.java:
##########
@@ -78,6 +73,13 @@ public <F extends ContentFile<F>> ManifestEntry<F> apply(ManifestEntry<F> manife
         manifestEntry.setFileSequenceNumber(sequenceNumber);

Review Comment:
   not caused by this PR. can we fix the comment typo in the line 70? `iff` -> `off`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#issuecomment-1538627357

   Will take a look today. Thanks, @gaborkaszab!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188042023


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */
+  default Long fileSequenceNumber() {
+    return null;

Review Comment:
   Shall we also implement these methods in `IndexedXXX` in `V1Metadata` and `V2Metadata` and delegate to the wrapper, just like we do for `specId`? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188012934


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -76,6 +76,9 @@ public PartitionData copy() {
   private byte[] keyMetadata = null;
   private Integer sortOrderId;
 
+  private Long dataSequenceNumber = null;

Review Comment:
   Is there any particular reason to have these two as a separate block? Right now, `BaseFile` only have two blocks to keep optional fields separate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189776432


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */

Review Comment:
   Sure, thanks for the suggestion. The comments I added to ContentFile are indeed a bit light. My thinking was that I didn't want to duplicate the comments from ManifestEntry, but you're right, it makes sense to be verbose with the comments in ContentFile as well.
   I merge the 2 proposals above and added to the function.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188042023


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */
+  default Long fileSequenceNumber() {
+    return null;

Review Comment:
   Shall we also implement these methods in `IndexedXXX` in `V1Metadata` and `V2Metadata` and delegate to the wrapper, just like we do for `specId`? There is probably no use case, though. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1191780054


##########
core/src/test/java/org/apache/iceberg/TestSnapshot.java:
##########
@@ -170,4 +170,120 @@ public void testCachedDeleteFiles() {
     Assert.assertEquals(
         "Partition must match", thirdSnapshotDeleteFile.partition(), addedDeleteFile.partition());
   }
+
+  @Test
+  public void testSequenceNumbersInAddedDataFiles() {
+    long expectedSequenceNumber = 0L;
+    if (formatVersion >= 2) {
+      expectedSequenceNumber = 1L;
+    }

Review Comment:
   Nit: could we double check style throughout the test class, if blocks should have a newline after



##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -562,6 +553,44 @@ void validateDeleteManifest(
     Assert.assertFalse("Should find all files in the manifest", expectedFiles.hasNext());
   }
 
+  private <T extends ContentFile<T>> void validateManifestSequenceNumbers(
+      ManifestEntry<T> entry, Iterator<Long> dataSeqs, Iterator<Long> fileSeqs) {
+    if (dataSeqs != null) {
+      V1Assert.assertEquals(
+          "Data sequence number should default to 0", 0, entry.dataSequenceNumber().longValue());
+      V1Assert.assertEquals(
+          "Data sequence number should default to 0",
+          0,
+          entry.file().dataSequenceNumber().longValue());
+
+      Long expectedSequenceNumber = dataSeqs.next();
+      V2Assert.assertEquals(
+          "Data sequence number should match expected",
+          expectedSequenceNumber,
+          entry.dataSequenceNumber());
+      V2Assert.assertEquals(
+          "Data sequence number should match expected",
+          expectedSequenceNumber,
+          entry.file().dataSequenceNumber());
+    }

Review Comment:
   Nit: newline after if



##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -478,6 +500,8 @@ public String toString() {
         .add("split_offsets", splitOffsets == null ? "null" : splitOffsets())
         .add("equality_ids", equalityIds == null ? "null" : equalityFieldIds())
         .add("sort_order_id", sortOrderId)
+        .add("data_sequence_number", dataSequenceNumber == null ? "null" : dataSequenceNumber)
+        .add("file_sequence_number", fileSequenceNumber == null ? "null" : fileSequenceNumber)

Review Comment:
   Nit: I suppose this is fine, but I think add will already take care of converting the null for us (looking at this [link](https://guava.dev/releases/19.0/api/docs/com/google/common/base/MoreObjects.ToStringHelper.html#add(java.lang.String,%20long))



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188016136


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -222,6 +227,24 @@ void setSpecId(int specId) {
     this.partitionSpecId = specId;
   }
 
+  @Override
+  public Long dataSequenceNumber() {
+    return this.dataSequenceNumber;

Review Comment:
   minor: We rarely use explicit `this` whenever accessing a var, just when setting.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188123566


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */

Review Comment:
   > The file sequence number is always assigned at commit and cannot be  provided explicitly, unlike the data sequence number. The file sequence number does not change upon assigning.
   
   Can we also use the compaction example? 
   ```
   The file sequence number is always assigned at commit. In case of rewrite (like compaction), 
   file sequence number can be higher than the data sequence number.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188123566


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */

Review Comment:
   > The file sequence number is always assigned at commit and cannot be  provided explicitly, unlike the data sequence number. The file sequence number does not change upon assigning.
   
   Can we also use the compaction example? 
   ```
   The file sequence number is always assigned at commit. In case of rewrite (like compaction), file sequence number can be higher than the data sequence number.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189241538


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */

Review Comment:
   Adding the compaction example makes sense to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1194482684


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -76,6 +76,9 @@ public PartitionData copy() {
   private byte[] keyMetadata = null;
   private Integer sortOrderId;
 
+  private Long dataSequenceNumber = null;

Review Comment:
   Are these two indeed optional? I'd say they belong more to the block with `partitionSpecId`. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1195387229


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -478,6 +500,8 @@ public String toString() {
         .add("split_offsets", splitOffsets == null ? "null" : splitOffsets())
         .add("equality_ids", equalityIds == null ? "null" : equalityFieldIds())
         .add("sort_order_id", sortOrderId)
+        .add("data_sequence_number", dataSequenceNumber == null ? "null" : dataSequenceNumber)
+        .add("file_sequence_number", fileSequenceNumber == null ? "null" : fileSequenceNumber)

Review Comment:
   Cool, I think we can take the simplification in another PR! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188042023


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */
+  default Long fileSequenceNumber() {
+    return null;

Review Comment:
   Shall we also implement these methods in `IndexedXXX` in `V1Metadata` and `V2Metadata` and delegate to the wrapper, just like we do for `specId`? There is probably no use case, though. Anything I missed where that would be needed? If not, we can skip it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188003885


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {
+    return null;
+  }
+
+  /** Returns the file sequence number. */

Review Comment:
   We, probably, need to add more context on what this sequence number represents given that it is a public API. What about something like this?
   
   ```
   /**
    * Returns the file sequence number.
    *
    * <p>The file sequence number represents the sequence number of the snapshot in which the
    * underlying file was added. The file sequence number is always assigned at commit and cannot be
    * provided explicitly, unlike the data sequence number. The file sequence number does not change
    * upon assigning.
    *
    * <p>This method can return null if the file sequence number is unknown. This may happen while
    * reading a v2 manifest that did not persist the file sequence number for manifest entries with
    * status EXISTING or DELETED (older Iceberg versions).
    */
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1187999600


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */
+  default Long dataSequenceNumber() {

Review Comment:
   We actually deprecated and removed `sequenceNumber` in `ManifestEntry` cause it was not clear what it represents. I'd be inclined to match what we currently have in `ManifestEntry`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] gaborkaszab commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "gaborkaszab (via GitHub)" <gi...@apache.org>.
gaborkaszab commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1189781669


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -76,6 +76,9 @@ public PartitionData copy() {
   private byte[] keyMetadata = null;
   private Integer sortOrderId;
 
+  private Long dataSequenceNumber = null;

Review Comment:
   I didn't realise that pattern with the blocks because 'avroSchema' and 'fromProjectionPos/partitionType' also make their own block. Regardless, I merged the new ones with the optional block.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] stevenzwu commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "stevenzwu (via GitHub)" <gi...@apache.org>.
stevenzwu commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1191356000


##########
api/src/main/java/org/apache/iceberg/ContentFile.java:
##########
@@ -113,6 +113,16 @@ default Integer sortOrderId() {
     return null;
   }
 
+  /** Returns the data sequence number of the snapshot in which the file should be applied. */

Review Comment:
   @gaborkaszab did you push the change? I am still seeing the old Javadoc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#issuecomment-1548786702

   Let me take another look in a bit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #7555: API, Core: Expose file and data sequence numbers through ContentFile

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #7555:
URL: https://github.com/apache/iceberg/pull/7555#discussion_r1188012934


##########
core/src/main/java/org/apache/iceberg/BaseFile.java:
##########
@@ -76,6 +76,9 @@ public PartitionData copy() {
   private byte[] keyMetadata = null;
   private Integer sortOrderId;
 
+  private Long dataSequenceNumber = null;

Review Comment:
   Is there any particular reason to have these two as a separate block? My initial thought was to put them somewhere next to `partitionSpecId` above.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org