You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/09 15:44:52 UTC

[GitHub] [iceberg] nastra opened a new pull request, #5734: API/Core: Remove deprecated methods from Snapshot API

nastra opened a new pull request, #5734:
URL: https://github.com/apache/iceberg/pull/5734

   This also refactors the Snapshot API to track v1 manifest locations and lazily load them when needed, thus removing the need for `FileIO` in the different parsers


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r979087293


##########
core/src/main/java/org/apache/iceberg/BaseSnapshot.java:
##########
@@ -126,18 +84,37 @@ class BaseSnapshot implements Snapshot {
       Integer schemaId,
       List<ManifestFile> dataManifests) {
     this(
-        io,
         INITIAL_SEQUENCE_NUMBER,
         snapshotId,
         parentId,
         timestampMillis,
         operation,
         summary,
         schemaId,
-        null);
+        dataManifests.stream().map(ManifestFile::path).toArray(String[]::new));

Review Comment:
   Why not set `allManifests` from this list? I think we're probably dropping metadata here, although it doesn't matter much for the tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r967891641


##########
core/src/main/java/org/apache/iceberg/BaseSnapshot.java:
##########
@@ -126,18 +84,37 @@ class BaseSnapshot implements Snapshot {
       Integer schemaId,
       List<ManifestFile> dataManifests) {
     this(
-        io,
         INITIAL_SEQUENCE_NUMBER,
         snapshotId,
         parentId,
         timestampMillis,
         operation,
         summary,
         schemaId,
-        null);
+        dataManifests.stream().map(ManifestFile::path).collect(Collectors.toList()));

Review Comment:
   Since this is `Serializable`, we should use an array for the paths, rather than a list.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r978516227


##########
api/src/main/java/org/apache/iceberg/Snapshot.java:
##########
@@ -210,12 +154,24 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) {
   }
 
   /**
-   * Return the location of this snapshot's manifest list, or null if it is not separate.
+   * Return the location of this snapshot's manifest list, or null if it is not separate. If this
+   * method returns null, then {@link Snapshot#v1ManifestLocations()} must return non-null.
    *
    * @return the location of the manifest list for this Snapshot
    */
   String manifestListLocation();
 
+  /**
+   * Returns a collection of different manifest locations for this Snapshot, or null if manifests
+   * are tracked in a single manifest list file (thus {@link Snapshot#manifestListLocation()} must
+   * return non-null).
+   *
+   * @return A collection of different manifest locations for this Snapshot
+   */
+  default List<String> v1ManifestLocations() {

Review Comment:
   As discussed in https://github.com/apache/iceberg/pull/5773#discussion_r978390263 I removed this method again from the `Snapshot` API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r968598297


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -80,8 +77,8 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio
     } else {
       // embed the manifest list in the JSON, v1 only
       generator.writeArrayFieldStart(MANIFESTS);
-      for (ManifestFile file : snapshot.allManifests()) {
-        generator.writeString(file.path());
+      for (String location : snapshot.v1ManifestLocations()) {

Review Comment:
   the entire problem is basically that we have to provide a `FileIO` instance (which we don't necessarily have) when calling `allManifests(io)`. In order to initialize `allManifests` for the V1 code path of creating a Snapshot, one has to provide a `List<ManifestFile>`, which requires a `FileIO` instance (unless I'm missing an important piece of information). Therefore we need to have a way to retrieve a collection of manifest locations without a dependency to `FileIO`. Does that make sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#issuecomment-1243050626

   This looks great to me, other than a minor serialization convention (prefer arrays to lists for Kryo) and adding a public API method that I don't think we need. Thanks, @nastra!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r978391410


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -80,8 +77,8 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio
     } else {
       // embed the manifest list in the JSON, v1 only
       generator.writeArrayFieldStart(MANIFESTS);
-      for (ManifestFile file : snapshot.allManifests()) {
-        generator.writeString(file.path());
+      for (String location : snapshot.v1ManifestLocations()) {

Review Comment:
   As discussed in https://github.com/apache/iceberg/pull/5773#discussion_r978390263 I changed this to use a fake FileIO to retrieve the paths



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
nastra commented on PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#issuecomment-1248629058

   @rdblue we discussed removing `write.manifest-lists.enabled` and relevant code paths, should we go ahead and merge this PR here and tackle the removal around `write.manifest-lists.enabled` in a separate PR mainly because this PR would get even bigger than it is right now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#issuecomment-1256715775

   Thanks, @nastra! Looks great. I merged this and we can follow up on #5773.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r978415934


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -158,20 +161,64 @@ static Snapshot fromJson(FileIO io, JsonNode node) {
     } else {
       // fall back to an embedded manifest list. pass in the manifest's InputFile so length can be
       // loaded lazily, if it is needed
-      List<ManifestFile> manifests =
-          Lists.transform(
-              JsonUtil.getStringList(MANIFESTS, node),
-              location -> new GenericManifestFile(io.newInputFile(location), 0));

Review Comment:
   ah nvm, that won't work because we can't read the length with `DUMMY_FILE_IO`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r967892949


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -80,8 +77,8 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio
     } else {
       // embed the manifest list in the JSON, v1 only
       generator.writeArrayFieldStart(MANIFESTS);
-      for (ManifestFile file : snapshot.allManifests()) {
-        generator.writeString(file.path());
+      for (String location : snapshot.v1ManifestLocations()) {

Review Comment:
   I think this should still be `allManifests`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r967892489


##########
api/src/main/java/org/apache/iceberg/Snapshot.java:
##########
@@ -210,12 +154,24 @@ default Iterable<DeleteFile> removedDeleteFiles(FileIO io) {
   }
 
   /**
-   * Return the location of this snapshot's manifest list, or null if it is not separate.
+   * Return the location of this snapshot's manifest list, or null if it is not separate. If this
+   * method returns null, then {@link Snapshot#v1ManifestLocations()} must return non-null.
    *
    * @return the location of the manifest list for this Snapshot
    */
   String manifestListLocation();
 
+  /**
+   * Returns a collection of different manifest locations for this Snapshot, or null if manifests
+   * are tracked in a single manifest list file (thus {@link Snapshot#manifestListLocation()} must
+   * return non-null).
+   *
+   * @return A collection of different manifest locations for this Snapshot
+   */
+  default List<String> v1ManifestLocations() {

Review Comment:
   I don't see a reason to add this method to the API. The existing methods should be fine for returning these. It doesn't matter where the manifests are tracked to callers.
   
   Looks like this is only used for writing manifests in the metadata file as paths, which can be done using `allManifests`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5734:
URL: https://github.com/apache/iceberg/pull/5734#discussion_r978406424


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -158,20 +161,64 @@ static Snapshot fromJson(FileIO io, JsonNode node) {
     } else {
       // fall back to an embedded manifest list. pass in the manifest's InputFile so length can be
       // loaded lazily, if it is needed
-      List<ManifestFile> manifests =
-          Lists.transform(
-              JsonUtil.getStringList(MANIFESTS, node),
-              location -> new GenericManifestFile(io.newInputFile(location), 0));

Review Comment:
   @rdblue do we want to use the `DUMMY_FILE_IO` instance here as well and keep eagerly reading the manifest file locations? Then we could also remove the internal `v1ManifestLocations` variable from `BaseSnapshot`. Let me know what you think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5734: API/Core: Remove deprecated methods from Snapshot API

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5734:
URL: https://github.com/apache/iceberg/pull/5734


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org