You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/10/07 15:12:30 UTC

[GitHub] [iceberg] nastra opened a new pull request, #5773: Core: Deprecate write.manifest-lists.enabled flag

nastra opened a new pull request, #5773:
URL: https://github.com/apache/iceberg/pull/5773

   This depends on https://github.com/apache/iceberg/pull/5734


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5773: Core: Remove write.manifest-lists.enabled and relevant code paths

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r978425493


##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -246,9 +246,6 @@ private TableProperties() {}
   public static final String WRITE_PARTITION_SUMMARY_LIMIT = "write.summary.partition-limit";
   public static final int WRITE_PARTITION_SUMMARY_LIMIT_DEFAULT = 0;
 
-  public static final String MANIFEST_LISTS_ENABLED = "write.manifest-lists.enabled";

Review Comment:
   good point, will do that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r989806000


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -76,18 +76,8 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio
       generator.writeEndObject();
     }
 
-    String manifestList = snapshot.manifestListLocation();
-    if (manifestList != null) {
-      // write just the location. manifests should not be embedded in JSON along with a list
-      generator.writeStringField(MANIFEST_LIST, manifestList);
-    } else {
-      // embed the manifest list in the JSON, v1 only
-      generator.writeArrayFieldStart(MANIFESTS);
-      for (ManifestFile file : snapshot.allManifests(DUMMY_FILE_IO)) {
-        generator.writeString(file.path());
-      }
-      generator.writeEndArray();
-    }
+    // write just the location. manifests should not be embedded in JSON along with a list
+    generator.writeStringField(MANIFEST_LIST, snapshot.manifestListLocation());

Review Comment:
   ok good point, I added that piece back



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra closed pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
nastra closed pull request #5773: Core: Deprecate write.manifest-lists.enabled flag
URL: https://github.com/apache/iceberg/pull/5773


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5773: Core: Remove write.manifest-lists.enabled and relevant code paths

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r978086354


##########
core/src/main/java/org/apache/iceberg/TableProperties.java:
##########
@@ -246,9 +246,6 @@ private TableProperties() {}
   public static final String WRITE_PARTITION_SUMMARY_LIMIT = "write.summary.partition-limit";
   public static final int WRITE_PARTITION_SUMMARY_LIMIT_DEFAULT = 0;
 
-  public static final String MANIFEST_LISTS_ENABLED = "write.manifest-lists.enabled";

Review Comment:
   Should we deprecate this constant so that people see that it no longer works?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5773: Core: Remove write.manifest-lists.enabled and relevant code paths

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r978390263


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -153,35 +139,22 @@ static Snapshot fromJson(FileIO io, JsonNode node) {
 
     Integer schemaId = JsonUtil.getIntOrNull(SCHEMA_ID, node);
 
-    if (node.has(MANIFEST_LIST)) {

Review Comment:
   Using a fake FileIO in `for (ManifestFile file : snapshot.allManifests(DUMMY_FILE_IO))` sounds great and we don't need expose the `v1ManifestLocations` on the `Snapshot` API itself. I went ahead and implemented that and pushed that to https://github.com/apache/iceberg/pull/5734.
   
   I would suggest that we first finish up https://github.com/apache/iceberg/pull/5734 and then come back to this PR to deal with the `write.manifest-lists.enabled` flag (aka deprecating it and removing it from the write path, but still keeping it in the read path in `SnapshotParser` as you indicated above). I will update this PR here accordingly once https://github.com/apache/iceberg/pull/5734 is in



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r989806000


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -76,18 +76,8 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio
       generator.writeEndObject();
     }
 
-    String manifestList = snapshot.manifestListLocation();
-    if (manifestList != null) {
-      // write just the location. manifests should not be embedded in JSON along with a list
-      generator.writeStringField(MANIFEST_LIST, manifestList);
-    } else {
-      // embed the manifest list in the JSON, v1 only
-      generator.writeArrayFieldStart(MANIFESTS);
-      for (ManifestFile file : snapshot.allManifests(DUMMY_FILE_IO)) {
-        generator.writeString(file.path());
-      }
-      generator.writeEndArray();
-    }
+    // write just the location. manifests should not be embedded in JSON along with a list
+    generator.writeStringField(MANIFEST_LIST, snapshot.manifestListLocation());

Review Comment:
   ok good point, I added that piece back with a test



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5773: Core: Remove write.manifest-lists.enabled and relevant code paths

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r978176248


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -153,35 +139,22 @@ static Snapshot fromJson(FileIO io, JsonNode node) {
 
     Integer schemaId = JsonUtil.getIntOrNull(SCHEMA_ID, node);
 
-    if (node.has(MANIFEST_LIST)) {

Review Comment:
   Okay, I took a deeper look into our options here. I don't think that we should break tables that have embedded list of manifests. I'm okay with no longer writing the embedded list, but I don't want to break compatibility with older tables.
   
   That means we need this check.
   
   Reading should work fine by lazily converting to `GenericManifestFile` when a `FileIO` is provided through the `Snapshot` API. However, making changes to the table metadata would break when writing out old snapshots. I was thinking earlier that we could rely on `SnapshotProducer` to rewrite, but if the update is a metadata change then we don't get a new snapshot that uses a list. We could have an operation that rewrites the snapshots to use lists, but that seems like a lot of work.
   
   Instead, I think that the cleanest option is to pass `FileIO` to `allManifests` for the write path where `v1ManifestLocations` is being called. We can either do that by passing in a real `FileIO` or by creating a fake `FileIO`.
   
   I'd use a fake `FileIO` because we know that it is only used to create `InputFile` instances for the list of manifests and then only `path` is called on those input files to return the location. I think that's a clean way to remove `FileIO` without adding a new method to the `Snapshot` interface. It's also pretty small:
   
   ```java
     private static class DummyFileIO implements FileIO {
       @Override
       public InputFile newInputFile(String path) {
         return new InputFile() {
           @Override
           public long getLength() {
             throw new UnsupportedOperationException();
           }
   
           @Override
           public SeekableInputStream newStream() {
             throw new UnsupportedOperationException();
           }
   
           @Override
           public String location() {
             return path;
           }
   
           @Override
           public boolean exists() {
             return true;
           }
         };
       }
   
       @Override
       public OutputFile newOutputFile(String path) {
         throw new UnsupportedOperationException();
       }
   
       @Override
       public void deleteFile(String path) {
         throw new UnsupportedOperationException();
       }
     }
   ```
   
   I think I prefer our options in this order:
   1. Use a fake FileIO to get the files
   2. Keep passing a real FileIO in the write path only
   3. Add `v1ManifestLocations` to the `Snapshot` interface
   
   @nastra, what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r989386266


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -76,18 +76,8 @@ static void toJson(Snapshot snapshot, JsonGenerator generator) throws IOExceptio
       generator.writeEndObject();
     }
 
-    String manifestList = snapshot.manifestListLocation();
-    if (manifestList != null) {
-      // write just the location. manifests should not be embedded in JSON along with a list
-      generator.writeStringField(MANIFEST_LIST, manifestList);
-    } else {
-      // embed the manifest list in the JSON, v1 only
-      generator.writeArrayFieldStart(MANIFESTS);
-      for (ManifestFile file : snapshot.allManifests(DUMMY_FILE_IO)) {
-        generator.writeString(file.path());
-      }
-      generator.writeEndArray();
-    }
+    // write just the location. manifests should not be embedded in JSON along with a list
+    generator.writeStringField(MANIFEST_LIST, snapshot.manifestListLocation());

Review Comment:
   I don't think that we can make this change. We can ensure that all of the snapshots have a manifest list when producing them, but unless we want to change existing snapshots, we can't change the snapshot serialization. If there is not manifest list, we have to write out the manifests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#issuecomment-1271807843

   Thanks, @nastra!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
nastra commented on PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#issuecomment-1256880045

   @rdblue I've rebased this one now


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
rdblue merged PR #5773:
URL: https://github.com/apache/iceberg/pull/5773


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #5773: Core: Remove write.manifest-lists.enabled and relevant code paths

Posted by GitBox <gi...@apache.org>.
rdblue commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r978091844


##########
core/src/main/java/org/apache/iceberg/SnapshotParser.java:
##########
@@ -153,35 +139,22 @@ static Snapshot fromJson(FileIO io, JsonNode node) {
 
     Integer schemaId = JsonUtil.getIntOrNull(SCHEMA_ID, node);
 
-    if (node.has(MANIFEST_LIST)) {

Review Comment:
   This is still needed. If Iceberg is reading an old table that embeds the manifest list locations, then there's no reason why it should fail. That would be a backward-incompatible change. As long as we don't write out a form with the manifests, we should be fine.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
rdblue commented on PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#issuecomment-1256715970

   Now that #5734 is in, this needs to be rebased.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #5773: Core: Deprecate write.manifest-lists.enabled flag

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #5773:
URL: https://github.com/apache/iceberg/pull/5773#discussion_r984547407


##########
core/src/test/java/org/apache/iceberg/TestTableMetadata.java:
##########
@@ -91,27 +92,27 @@ public class TestTableMetadata {
   @Test
   public void testJsonConversion() throws Exception {
     long previousSnapshotId = System.currentTimeMillis() - new Random(1234).nextInt(3600);
+
+    String manifestList =
+        createManifestListWithManifestFile(previousSnapshotId, null, "file:/tmp/manifest1.avro");
     Snapshot previousSnapshot =
         new BaseSnapshot(
-            previousSnapshotId,
-            null,
-            previousSnapshotId,
-            null,
-            null,
-            null,
-            ImmutableList.of(
-                new GenericManifestFile(localInput("file:/tmp/manfiest.1.avro"), SPEC_5.specId())));
+            0, previousSnapshotId, null, previousSnapshotId, null, null, null, manifestList);
+
     long currentSnapshotId = System.currentTimeMillis();
+    manifestList =
+        createManifestListWithManifestFile(
+            currentSnapshotId, previousSnapshotId, "file:/tmp/manifest2.avro");
     Snapshot currentSnapshot =
         new BaseSnapshot(
+            0,
             currentSnapshotId,
             previousSnapshotId,
             currentSnapshotId,
             null,
             null,
             7,
-            ImmutableList.of(
-                new GenericManifestFile(localInput("file:/tmp/manfiest.2.avro"), SPEC_5.specId())));
+            manifestList);

Review Comment:
   the previous test was referring to a single manifest file which I'm doing as well via `createManifestListWithManifestFile()`. It doesn't seem to matter in these tests particularly whether the manifest list at this point contains only `/tmp/manifest2.avro` or `/tmp/manifest1.avro` + `/tmp/manifest2.avro`, so just wanted to mention this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org