You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "wypoon (via GitHub)" <gi...@apache.org> on 2023/02/10 02:01:23 UTC

[GitHub] [iceberg] wypoon opened a new pull request, #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

wypoon opened a new pull request, #6799:
URL: https://github.com/apache/iceberg/pull/6799

   This is a continuation of https://github.com/apache/iceberg/pull/5893.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108942951


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   Here is an example: https://github.com/apache/iceberg/commit/128d5a161fda076118a7cab1d95ab5064400e08a.
   Deprecate old method and add new replacement method at the same time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116502327


##########
core/src/main/java/org/apache/iceberg/ManifestLists.java:
##########
@@ -21,51 +21,79 @@
 import java.io.IOException;
 import java.util.List;
 import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.avro.AvroIterable;
 import org.apache.iceberg.exceptions.RuntimeIOException;
 import org.apache.iceberg.io.CloseableIterable;
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 class ManifestLists {
   private ManifestLists() {}
 
   static List<ManifestFile> read(InputFile manifestList) {
-    try (CloseableIterable<ManifestFile> files =
-        Avro.read(manifestList)
-            .rename("manifest_file", GenericManifestFile.class.getName())
-            .rename("partitions", GenericPartitionFieldSummary.class.getName())
-            .rename("r508", GenericPartitionFieldSummary.class.getName())
-            .classLoader(GenericManifestFile.class.getClassLoader())
-            .project(ManifestFile.schema())
-            .reuseContainers(false)
-            .build()) {
-
+    try (CloseableIterable<ManifestFile> files = manifestFileIterable(manifestList)) {
       return Lists.newLinkedList(files);
-

Review Comment:
   I think it was an artifact of how diff chose to show the difference, but I add a blank line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111315347


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -307,6 +353,30 @@ ManifestFile writeDeleteManifest(int newFormatVersion, Long snapshotId, DeleteFi
     return writer.toManifestFile();
   }
 
+  InputFile writeManifestList(String compressionCodec, ManifestFile... manifestFiles)
+      throws IOException {
+    File manifestListFile = temp.newFile();
+    Assert.assertTrue(manifestListFile.delete());
+    OutputFile outputFile =
+        org.apache.iceberg.Files.localOutput(
+            FileFormat.AVRO.addExtension(manifestListFile.toString()));
+
+    try (FileAppender<ManifestFile> writer =
+        ManifestLists.write(
+            formatVersion,
+            outputFile,
+            SNAPSHOT_ID,
+            SNAPSHOT_ID - 1,
+            formatVersion > 1 ? 34L : 0,
+            compressionCodec,
+            /* compressionLevel */ null)) {
+      for (ManifestFile manifestFile : manifestFiles) {
+        writer.add(manifestFile);
+      }
+    }
+    return outputFile.toInputFile();

Review Comment:
   Style: there should be an empty newline between a control flow block and the following statement.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1107805999


##########
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.iceberg.avro.AvroIterable;
+import org.apache.iceberg.io.InputFile;
+import org.junit.Assert;
+import org.junit.Assume;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestManifestListWriter extends TableTestBase {
+
+  @Parameterized.Parameters(name = "formatVersion = {0}")
+  public static Object[] parameters() {
+    return new Object[] {1, 2};
+  }
+
+  public TestManifestListWriter(int formatVersion) {
+    super(formatVersion);
+  }
+
+  @Test
+  public void testWriteManifestListWithCompression() throws IOException {
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          return writeManifestList(compressionCodec, manifest);
+        });
+  }
+
+  @Test
+  public void testWriteDeleteManifestListWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          ManifestFile deleteManifest =
+              writeDeleteManifest(formatVersion, SNAPSHOT_ID, compressionCodec, FILE_A_DELETES);
+          return writeManifestList(compressionCodec, manifest, deleteManifest);
+        });
+  }
+
+  void validateManifestListCompressionCodec(
+      CheckedFunction<String, InputFile> createManifestListFunc) throws IOException {
+    for (Map.Entry<String, String> entry : CODEC_METADATA_MAPPING.entrySet()) {
+      String codec = entry.getKey();
+      String expectedCodecValue = entry.getValue();
+
+      InputFile manifestList = createManifestListFunc.apply(codec);
+      try (AvroIterable<ManifestFile> reader = ManifestLists.manifestFileIterable(manifestList)) {
+        Map<String, String> metadata = reader.getMetadata();
+        Assert.assertEquals(

Review Comment:
   Adopted.



##########
core/src/test/java/org/apache/iceberg/TestManifestWriter.java:
##########
@@ -234,4 +251,24 @@ private DataFile newFile(long recordCount, StructLike partition) {
     }
     return builder.build();
   }
+
+  <F extends ContentFile<F>> void validateManifestCompressionCodec(
+      CheckedFunction<String, ManifestFile> createManifestFunc,
+      CheckedFunction<ManifestFile, ManifestReader<F>> manifestReaderFunc)
+      throws IOException {
+    for (Map.Entry<String, String> entry : CODEC_METADATA_MAPPING.entrySet()) {
+      String codec = entry.getKey();
+      String expectedCodecValue = entry.getValue();
+
+      ManifestFile manifest = createManifestFunc.apply(codec);
+
+      try (ManifestReader<F> reader = manifestReaderFunc.apply(manifest)) {
+        Map<String, String> metadata = reader.metadata();
+        Assert.assertEquals(

Review Comment:
   Adopted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106886325


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -64,7 +69,7 @@ private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
   protected abstract FileAppender<ManifestEntry<F>> newAppender(
-      PartitionSpec spec, OutputFile outputFile);
+      PartitionSpec spec, OutputFile outputFile, String compressionCodec, Integer compressionLevel);

Review Comment:
   I believe it should be as I outlined in 667aacc59ed5d8b70baf94c6c290f7538d00f790. After one minor release we can then clean up the API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1758776115

   Will do a review by the end of this week, sorry for the delay.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106575723


##########
versions.props:
##########
@@ -48,3 +48,4 @@ com.esotericsoftware:kryo = 4.0.2
 org.eclipse.jetty:* = 9.4.43.v20210629
 org.testcontainers:* = 1.17.5
 io.delta:delta-core_* = 2.2.0
+com.github.luben:zstd-jni = 1.5.2-3

Review Comment:
   Sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1107821274


##########
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.iceberg.avro.AvroIterable;
+import org.apache.iceberg.io.InputFile;
+import org.junit.Assert;
+import org.junit.Assume;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestManifestListWriter extends TableTestBase {
+
+  @Parameterized.Parameters(name = "formatVersion = {0}")
+  public static Object[] parameters() {
+    return new Object[] {1, 2};
+  }
+
+  public TestManifestListWriter(int formatVersion) {
+    super(formatVersion);
+  }
+
+  @Test
+  public void testWriteManifestListWithCompression() throws IOException {
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          return writeManifestList(compressionCodec, manifest);
+        });
+  }
+
+  @Test
+  public void testWriteDeleteManifestListWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          ManifestFile deleteManifest =
+              writeDeleteManifest(formatVersion, SNAPSHOT_ID, compressionCodec, FILE_A_DELETES);
+          return writeManifestList(compressionCodec, manifest, deleteManifest);
+        });
+  }
+
+  void validateManifestListCompressionCodec(
+      CheckedFunction<String, InputFile> createManifestListFunc) throws IOException {
+    for (Map.Entry<String, String> entry : CODEC_METADATA_MAPPING.entrySet()) {
+      String codec = entry.getKey();
+      String expectedCodecValue = entry.getValue();
+
+      InputFile manifestList = createManifestListFunc.apply(codec);
+      try (AvroIterable<ManifestFile> reader = ManifestLists.manifestFileIterable(manifestList)) {
+        Map<String, String> metadata = reader.getMetadata();
+        Assert.assertEquals(

Review Comment:
   Btw, instead of defining our own constant, `AVRO_CODEC_KEY`, for "avro.codec", we can use `org.apache.avro.file.DataFileConstants.CODEC`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108121835


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   this is because the new method is also `abstract`. If you make it non-abstract (like https://github.com/apache/iceberg/commit/667aacc59ed5d8b70baf94c6c290f7538d00f790#diff-4e8faf21494a38d340335de99357de31cdca2f3f82af495f832f67fc2446ad49R79), then it won't break the API



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] sumeetgajjar commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "sumeetgajjar (via GitHub)" <gi...@apache.org>.
sumeetgajjar commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108847682


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -307,6 +353,30 @@ ManifestFile writeDeleteManifest(int newFormatVersion, Long snapshotId, DeleteFi
     return writer.toManifestFile();
   }
 
+  InputFile writeManifestList(String compressionCodec, ManifestFile... manifestFiles)
+      throws IOException {
+    File manifestListFile = temp.newFile();
+    Assert.assertTrue(manifestListFile.delete());
+    OutputFile outputFile =
+        org.apache.iceberg.Files.localOutput(
+            FileFormat.AVRO.addExtension(manifestListFile.toString()));
+
+    try (FileAppender<ManifestFile> writer =
+        ManifestLists.write(
+            formatVersion,
+            outputFile,
+            SNAPSHOT_ID,
+            SNAPSHOT_ID - 1,
+            formatVersion > 1 ? 34L : 0,

Review Comment:
   `34L` was used as the sequence number in `TestManifestListVersions#SEQ_NUM`.
   https://github.com/apache/iceberg/blob/085b556cca202db900755b7023e6d6aea0ae24fa/core/src/test/java/org/apache/iceberg/TestManifestListVersions.java#L49
   
   Thus decided to use the same value for the newly added tests as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1110461295


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   https://github.com/apache/iceberg/blob/master/CONTRIBUTING.md#minor-version-deprecations-required talks about deprecating things to be removed, but not about adding things. What would help is documentation about how to add new things.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1733296075

   > @nastra if you don't like #8617 can you please merge this instead? I have addressed all your feedback.
   
   I think it would be good to get some additional review from one other committer. /cc @Fokko @amogh-jahagirdar @aokolnychyi could any of you review this one as well please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331084556


##########
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.avro.file.DataFileConstants;
+import org.apache.iceberg.avro.AvroIterable;
+import org.apache.iceberg.io.InputFile;
+import org.assertj.core.api.Assertions;
+import org.junit.Assume;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestManifestListWriter extends TableTestBase {
+
+  @Parameterized.Parameters(name = "formatVersion = {0}")
+  public static Object[] parameters() {
+    return new Object[] {1, 2};
+  }
+
+  public TestManifestListWriter(int formatVersion) {
+    super(formatVersion);
+  }
+
+  @Test
+  public void testWriteManifestListWithCompression() throws IOException {
+    validateManifestListCompressionCodec(false);
+  }
+
+  @Test
+  public void testWriteDeleteManifestListWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);

Review Comment:
   no need to use a Junit4-style assumption. It would be better to use the one from AssertJ: `Assumtions.assumeThat(formatVersion).isGreaterThan(1)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1730099859

   ```
   TestStructuredStreamingRead3 > [catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hive, default-namespace=default}] > testReadingStreamFromFutureTimetsamp[catalogName = testhive, implementation = org.apache.iceberg.spark.SparkCatalog, config = {type=hive, default-namespace=default}] FAILED
       org.opentest4j.AssertionFailedError: 
       Expecting value to be true but was false
           at java.base@11.0.20/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
           at java.base@11.0.20/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at java.base@11.0.20/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           at app//org.apache.iceberg.spark.source.TestStructuredStreamingRead3.lambda$testReadingStreamFromFutureTimetsamp$0(TestStructuredStreamingRead3.java:266)
           at java.base@11.0.20/java.util.stream.Streams$RangeIntSpliterator.forEachRemaining(Streams.java:104)
           at java.base@11.0.20/java.util.stream.IntPipeline$Head.forEach(IntPipeline.java:593)
           at app//org.apache.iceberg.spark.source.TestStructuredStreamingRead3.testReadingStreamFromFutureTimetsamp(TestStructuredStreamingRead3.java:263)
   ```
   Is `TestStructuredStreamingRead3` flaky?
   spark-3x-scala-2-13-tests (8, 3.4) and spark-3x-java-17-tests (3.4, 2.13) passed; spark-3x-scala-2-13-tests (11, 3.4) failed due to this. It seems unlikely that it should fail just with java 11.
   spark-3x-scala-2-13-tests (8, 3.5) and spark-3x-scala-2-13-tests (11, 3.5) were cancelled due to the above failure, but  spark-3x-java-17-tests (3.5, 2.13) passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111315743


##########
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java:
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.avro.file.DataFileConstants;
+import org.apache.iceberg.avro.AvroIterable;
+import org.apache.iceberg.io.InputFile;
+import org.assertj.core.api.Assertions;
+import org.junit.Assume;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestManifestListWriter extends TableTestBase {
+
+  @Parameterized.Parameters(name = "formatVersion = {0}")
+  public static Object[] parameters() {
+    return new Object[] {1, 2};
+  }
+
+  public TestManifestListWriter(int formatVersion) {
+    super(formatVersion);
+  }
+
+  @Test
+  public void testWriteManifestListWithCompression() throws IOException {
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          return writeManifestList(compressionCodec, manifest);
+        });
+  }
+
+  @Test
+  public void testWriteDeleteManifestListWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          ManifestFile deleteManifest =
+              writeDeleteManifest(formatVersion, SNAPSHOT_ID, compressionCodec, FILE_A_DELETES);
+          return writeManifestList(compressionCodec, manifest, deleteManifest);
+        });
+  }
+
+  void validateManifestListCompressionCodec(
+      CheckedFunction<String, InputFile> createManifestListFunc) throws IOException {

Review Comment:
   Passing the write as a function and introducing an interface for it seems a bit too clever. I think it would be better to just combine them and not worry about slight duplication.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116507288


##########
core/src/main/java/org/apache/iceberg/ManifestLists.java:
##########
@@ -21,51 +21,79 @@
 import java.io.IOException;
 import java.util.List;
 import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.avro.AvroIterable;
 import org.apache.iceberg.exceptions.RuntimeIOException;
 import org.apache.iceberg.io.CloseableIterable;
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 class ManifestLists {
   private ManifestLists() {}
 
   static List<ManifestFile> read(InputFile manifestList) {
-    try (CloseableIterable<ManifestFile> files =
-        Avro.read(manifestList)
-            .rename("manifest_file", GenericManifestFile.class.getName())
-            .rename("partitions", GenericPartitionFieldSummary.class.getName())
-            .rename("r508", GenericPartitionFieldSummary.class.getName())
-            .classLoader(GenericManifestFile.class.getClassLoader())
-            .project(ManifestFile.schema())
-            .reuseContainers(false)
-            .build()) {
-
+    try (CloseableIterable<ManifestFile> files = manifestFileIterable(manifestList)) {
       return Lists.newLinkedList(files);
-
     } catch (IOException e) {
       throw new RuntimeIOException(
           e, "Cannot read manifest list file: %s", manifestList.location());
     }
   }
 
+  @VisibleForTesting
+  static AvroIterable<ManifestFile> manifestFileIterable(InputFile manifestList) {
+    return Avro.read(manifestList)
+        .rename("manifest_file", GenericManifestFile.class.getName())
+        .rename("partitions", GenericPartitionFieldSummary.class.getName())
+        .rename("r508", GenericPartitionFieldSummary.class.getName())
+        .classLoader(GenericManifestFile.class.getClassLoader())
+        .project(ManifestFile.schema())
+        .reuseContainers(false)
+        .build();
+  }
+
   static ManifestListWriter write(
       int formatVersion,
       OutputFile manifestListFile,
       long snapshotId,
       Long parentSnapshotId,
       long sequenceNumber) {
+    return write(
+        formatVersion,
+        manifestListFile,
+        snapshotId,
+        parentSnapshotId,
+        sequenceNumber,
+        /* compressionCodec */ null,
+        /* compressionLevel */ null);

Review Comment:
   Removed the comments, as they aren't strictly necessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116502327


##########
core/src/main/java/org/apache/iceberg/ManifestLists.java:
##########
@@ -21,51 +21,79 @@
 import java.io.IOException;
 import java.util.List;
 import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.avro.AvroIterable;
 import org.apache.iceberg.exceptions.RuntimeIOException;
 import org.apache.iceberg.io.CloseableIterable;
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 class ManifestLists {
   private ManifestLists() {}
 
   static List<ManifestFile> read(InputFile manifestList) {
-    try (CloseableIterable<ManifestFile> files =
-        Avro.read(manifestList)
-            .rename("manifest_file", GenericManifestFile.class.getName())
-            .rename("partitions", GenericPartitionFieldSummary.class.getName())
-            .rename("r508", GenericPartitionFieldSummary.class.getName())
-            .classLoader(GenericManifestFile.class.getClassLoader())
-            .project(ManifestFile.schema())
-            .reuseContainers(false)
-            .build()) {
-
+    try (CloseableIterable<ManifestFile> files = manifestFileIterable(manifestList)) {
       return Lists.newLinkedList(files);
-

Review Comment:
   I think it was an artifact of how diff chose to show the difference, but I added a blank line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1107805822


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +165,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  static final Map<String, String> CODEC_METADATA_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put("uncompressed", "null")

Review Comment:
   The `Codec` enum in `Avro` was private, so we couldn't use it. I made it public in order to use it as you suggest. In the process, I decided to rename the Map more appropriately and provided an explanatory comment. The keys are `Avro.Codec` values, while the values are now named constants from `org.apache.avro.file.DataFileConstants`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1110460926


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I probably don't understand how revapi works.
   Thank you for the examples, but unfortunately they do not answer the question on my mind. In your example, in the first change, you deprecate a number of methods and annotate that a replacement method should be used. In those cases, the replacement method had already been added (by some earlier change). Then in the second change, the deprecated methods are removed.
   I completely understand deprecating a method in one release, and removing it in the following release. What I don't understand is how to add the replacement method, or how to add a new method.
   In your example, in the first change, either the replacement method has already been added by some earlier change, or you amend a new method to have a default implementation (e.g., throwing an `UnsupportedOperationException`). In the example, this occurs in interfaces. There is unfortunately no example of an abstract class. If I'm not mistaken, you can add a default implementation to a method in an interface, but you cannot add a default implementation to an abstract method in an abstract class, as the method is then no longer abstract.
   
   Let's say I do as you suggest and make the `newAppender` with 4 parameters non-abstract as follow:
   ```
     protected FileAppender<ManifestEntry<F>> newAppender(
         PartitionSpec spec,
         OutputFile outputFile,
         String compressionCodec,
         Integer compressionLevel) {
       return newAppender(spec, outputFile);
     }
   ```
   In the following release, I'd remove the deprecated old `newAppender` with 2 parameters (well and good), and I'd need to change the `newAppender` with 4 parameters to be abstract, since the default implementation no longer makes sense. Wouldn't that break revapi at that point?
   
   Anyway, I just want to get this change in, and I'll make the change you suggest, but it doesn't seem right to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1727587204

   Thanks @wypoon, just some minor comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1730528283

   @nastra @ConeyLiu thanks for reviewing. I have updated the PR and CI is green.
   @ConeyLiu raises a good point about the new `newAppender` in `ManifestWriter`. I put up https://github.com/apache/iceberg/pull/8617 as an alternative. If there are no other sticking points, then I hope one of these two can be merged and make Iceberg 1.4.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1373486156


##########
core/src/main/java/org/apache/iceberg/ManifestFiles.java:
##########
@@ -157,11 +157,34 @@ public static ManifestWriter<DataFile> write(PartitionSpec spec, OutputFile outp
    */
   public static ManifestWriter<DataFile> write(
       int formatVersion, PartitionSpec spec, OutputFile outputFile, Long snapshotId) {
+    return write(formatVersion, spec, outputFile, snapshotId, null, null);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter} for the given format version.
+   *
+   * @param formatVersion a target format version
+   * @param spec a {@link PartitionSpec}
+   * @param outputFile an {@link OutputFile} where the manifest will be written
+   * @param snapshotId a snapshot ID for the manifest entries, or null for an inherited ID
+   * @param compressionCodec compression codec for the manifest file
+   * @param compressionLevel compression level of the compressionCodec
+   * @return a manifest writer
+   */
+  public static ManifestWriter<DataFile> write(

Review Comment:
   @aokolnychyi thanks for reviewing. I'm interested to hear what @rdblue thinks. In the meantime, let me think about how to address your concern. However, using a builder will mean an API break, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1674370082

   @wypoon do you still have time to work on this? And I am sorry to submit a similar PR. But it would be great to see if any of them can move forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1309450173


##########
core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java:
##########
@@ -170,7 +170,9 @@ private ManifestFile copyManifest(ManifestFile manifest) {
         specsById,
         newFile,
         snapshotId(),
-        summaryBuilder);
+        summaryBuilder,
+        current.properties().get(TableProperties.AVRO_COMPRESSION),

Review Comment:
   @nastra shall we follow @rdblue's feedback and keep to using compression codec and level?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1689278343

   @nastra gentle nudge.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1302679619


##########
core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java:
##########
@@ -170,7 +170,9 @@ private ManifestFile copyManifest(ManifestFile manifest) {
         specsById,
         newFile,
         snapshotId(),
-        summaryBuilder);
+        summaryBuilder,
+        current.properties().get(TableProperties.AVRO_COMPRESSION),

Review Comment:
   what I like about https://github.com/apache/iceberg/pull/8284 is that it just passes a Map of properties rather than the codec + compression level. Maybe we should do the same here to be more future-proof in case we'd want to pass anything else later on?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108168736


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -307,6 +353,30 @@ ManifestFile writeDeleteManifest(int newFormatVersion, Long snapshotId, DeleteFi
     return writer.toManifestFile();
   }
 
+  InputFile writeManifestList(String compressionCodec, ManifestFile... manifestFiles)
+      throws IOException {
+    File manifestListFile = temp.newFile();
+    Assert.assertTrue(manifestListFile.delete());
+    OutputFile outputFile =
+        org.apache.iceberg.Files.localOutput(
+            FileFormat.AVRO.addExtension(manifestListFile.toString()));
+
+    try (FileAppender<ManifestFile> writer =
+        ManifestLists.write(
+            formatVersion,
+            outputFile,
+            SNAPSHOT_ID,
+            SNAPSHOT_ID - 1,
+            formatVersion > 1 ? 34L : 0,

Review Comment:
   I'm guessing we are just writing some random number here to not cause conflicts?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106885157


##########
core/src/main/java/org/apache/iceberg/ManifestListWriter.java:
##########
@@ -31,14 +31,18 @@
 abstract class ManifestListWriter implements FileAppender<ManifestFile> {
   private final FileAppender<ManifestFile> writer;
 
-  private ManifestListWriter(OutputFile file, Map<String, String> meta) {
-    this.writer = newAppender(file, meta);
+  private ManifestListWriter(
+      OutputFile file,
+      Map<String, String> meta,
+      String compressionCodec,
+      Integer compressionLevel) {
+    this.writer = newAppender(file, meta, compressionCodec, compressionLevel);
   }
 
   protected abstract ManifestFile prepare(ManifestFile manifest);
 
   protected abstract FileAppender<ManifestFile> newAppender(
-      OutputFile file, Map<String, String> meta);
+      OutputFile file, Map<String, String> meta, String compressionCodec, Integer compressionLevel);

Review Comment:
   you're right, it's actually `ManifestWriter` and not `ManifestListWriter` (rev api breakages are sometimes difficult to read). 
   As mentioned further above we shouldn't be breaking the API (yet) as we need to go through one Deprecation cycle as indicated in https://github.com/apache/iceberg/blob/master/CONTRIBUTING.md#minor-version-deprecations-required



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1107803308


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   Retained but deprecated the old `newAppender` method, but adding the new one still causes a revapi breakage, as I mentioned before.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1107801736


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +165,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  static final Map<String, String> CODEC_METADATA_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put("uncompressed", "null")
+          .put("zstd", "zstandard")
+          .put("gzip", "deflate")
+          .build();
+
+  static final String AVRO_CODEC_KEY = "avro.codec";
+
+  static final long SNAPSHOT_ID = 987134631982734L;
+
+  private static final long SEQUENCE_NUMBER = 34L;

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111314641


##########
core/src/main/java/org/apache/iceberg/ManifestLists.java:
##########
@@ -21,51 +21,79 @@
 import java.io.IOException;
 import java.util.List;
 import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.avro.AvroIterable;
 import org.apache.iceberg.exceptions.RuntimeIOException;
 import org.apache.iceberg.io.CloseableIterable;
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 class ManifestLists {
   private ManifestLists() {}
 
   static List<ManifestFile> read(InputFile manifestList) {
-    try (CloseableIterable<ManifestFile> files =
-        Avro.read(manifestList)
-            .rename("manifest_file", GenericManifestFile.class.getName())
-            .rename("partitions", GenericPartitionFieldSummary.class.getName())
-            .rename("r508", GenericPartitionFieldSummary.class.getName())
-            .classLoader(GenericManifestFile.class.getClassLoader())
-            .project(ManifestFile.schema())
-            .reuseContainers(false)
-            .build()) {
-
+    try (CloseableIterable<ManifestFile> files = manifestFileIterable(manifestList)) {
       return Lists.newLinkedList(files);
-
     } catch (IOException e) {
       throw new RuntimeIOException(
           e, "Cannot read manifest list file: %s", manifestList.location());
     }
   }
 
+  @VisibleForTesting
+  static AvroIterable<ManifestFile> manifestFileIterable(InputFile manifestList) {
+    return Avro.read(manifestList)
+        .rename("manifest_file", GenericManifestFile.class.getName())
+        .rename("partitions", GenericPartitionFieldSummary.class.getName())
+        .rename("r508", GenericPartitionFieldSummary.class.getName())
+        .classLoader(GenericManifestFile.class.getClassLoader())
+        .project(ManifestFile.schema())
+        .reuseContainers(false)
+        .build();
+  }
+
   static ManifestListWriter write(
       int formatVersion,
       OutputFile manifestListFile,
       long snapshotId,
       Long parentSnapshotId,
       long sequenceNumber) {
+    return write(
+        formatVersion,
+        manifestListFile,
+        snapshotId,
+        parentSnapshotId,
+        sequenceNumber,
+        /* compressionCodec */ null,
+        /* compressionLevel */ null);

Review Comment:
   Style: Comments like this usually follow the argument.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1679808806

   I started working on rebasing this on master. This is somewhat bigger effort than I realized, due to the amount of change in master since this was updated, and will take a bit more time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1681553825

   @nastra can you please look at this again? I have rebased it on master as of some time yesterday. It is, as far as I know, equivalent to the state when you approved it, except that I have also updated Spark 3.4 and Flink 1.17. If you prefer, I can remove the changes from Spark 3.3 and Flink 1.16, but I kept them as they were there from before (those were then the latest Spark and Flink versions).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106679819


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -64,7 +69,7 @@ private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
   protected abstract FileAppender<ManifestEntry<F>> newAppender(
-      PartitionSpec spec, OutputFile outputFile);
+      PartitionSpec spec, OutputFile outputFile, String compressionCodec, Integer compressionLevel);

Review Comment:
   If I do
   ```
     protected FileAppender<ManifestEntry<F>> newAppender(PartitionSpec spec, OutputFile outputFile) {
       return newAppender(spec, outputFile, null, null);
     }
   
     protected abstract FileAppender<ManifestEntry<F>> newAppender(
         PartitionSpec spec, OutputFile outputFile, String compressionCodec, Integer compressionLevel);
   ```
   I get
   ```
   > There were Java public API/ABI breaks reported by revapi:
     
     java.method.abstractMethodAdded: Abstract method was added.
     
     old: <none>
     new: method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>> org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec, org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)
     
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106882841


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I think we can still do it without breaking the API. I created https://github.com/apache/iceberg/commit/1b237d570197d52fc0c7d959abc28ff14830d9aa in my fork that outlines it. 
   
   Basically we need to go through one deprecation cycle (one minor release) as indicated in https://github.com/apache/iceberg/blob/master/CONTRIBUTING.md#minor-version-deprecations-required before we can break the API here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108934159


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -307,6 +353,30 @@ ManifestFile writeDeleteManifest(int newFormatVersion, Long snapshotId, DeleteFi
     return writer.toManifestFile();
   }
 
+  InputFile writeManifestList(String compressionCodec, ManifestFile... manifestFiles)
+      throws IOException {
+    File manifestListFile = temp.newFile();
+    Assert.assertTrue(manifestListFile.delete());
+    OutputFile outputFile =
+        org.apache.iceberg.Files.localOutput(
+            FileFormat.AVRO.addExtension(manifestListFile.toString()));
+
+    try (FileAppender<ManifestFile> writer =
+        ManifestLists.write(
+            formatVersion,
+            outputFile,
+            SNAPSHOT_ID,
+            SNAPSHOT_ID - 1,
+            formatVersion > 1 ? 34L : 0,

Review Comment:
   Thanks Sumeet!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111313870


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +167,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  // Mapping of Avro codec name used by Iceberg to name used by Avro (and appearing in Avro metadata
+  // under the key, avro.codec).
+  // In tests, we use the Iceberg name to specify the codec, and we verify the codec used by reading
+  // the Avro metadata and checking for the Avro name in avro.codec.
+  static final Map<String, String> AVRO_CODEC_NAME_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put(Avro.Codec.UNCOMPRESSED.name(), DataFileConstants.NULL_CODEC)
+          .put(Avro.Codec.ZSTD.name(), DataFileConstants.ZSTANDARD_CODEC)
+          .put(Avro.Codec.GZIP.name(), DataFileConstants.DEFLATE_CODEC)
+          .build();
+
+  static final long SNAPSHOT_ID = 987134631982734L;

Review Comment:
   Actually, I see some tests with `DUMMY_SNAPSHOT_ID`. Can you reuse that instead of making a new one? You may need to move it here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332232576


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java:
##########
@@ -240,7 +240,10 @@ private List<ManifestFile> writeManifestsForUnpartitionedTable(
                 formatVersion,
                 combinedPartitionType,
                 spec,
-                sparkType),
+                sparkType,
+                table.properties().get(TableProperties.AVRO_COMPRESSION),
+                PropertyUtil.propertyAsNullableInt(
+                    table.properties(), TableProperties.AVRO_COMPRESSION_LEVEL)),

Review Comment:
   Will make the changes for Spark 3.5 as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116418672


##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -226,7 +226,9 @@ public Snapshot apply() {
             manifestList,
             snapshotId(),
             parentSnapshotId,
-            sequenceNumber)) {
+            sequenceNumber,
+            ops.current().properties().get(TableProperties.AVRO_COMPRESSION),
+            ops.current().propertyAsNullableInt(TableProperties.AVRO_COMPRESSION_LEVEL))) {

Review Comment:
   Ok. The call here to `Manifests.write` uses `ops.current().formatVersion()` for the format version; should I change that too? There are other uses of `ops.current()` in `SnapshotProducer` as well; should I change those too?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116504056


##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -226,7 +226,9 @@ public Snapshot apply() {
             manifestList,
             snapshotId(),
             parentSnapshotId,
-            sequenceNumber)) {
+            sequenceNumber,
+            ops.current().properties().get(TableProperties.AVRO_COMPRESSION),
+            ops.current().propertyAsNullableInt(TableProperties.AVRO_COMPRESSION_LEVEL))) {

Review Comment:
   I changed `ops.current()` to `base` in the methods that I modified, but not in the other methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1442874592

   @rdblue I have tried to address all your feedback. The only comment I did not understand is the one about not exposing `Avro.Codec`; I'm not sure what you'd like me to do. I'm fine with reverting it to private and simply using plain Strings in `TestTableBase.AVRO_CODEC_NAME_MAPPING`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116126444


##########
versions.props:
##########
@@ -48,3 +48,4 @@ com.esotericsoftware:kryo = 4.0.2
 org.eclipse.jetty:* = 9.4.43.v20210629
 org.testcontainers:* = 1.17.5
 io.delta:delta-core_* = 2.2.0
+com.github.luben:zstd-jni = 1.5.4-1

Review Comment:
   It is needed because new tests added run with a few different codecs, including zstd. Without the dependency, we get
   ```
       java.lang.NoClassDefFoundError: com/github/luben/zstd/ZstdOutputStreamNoFinalizer
           at org.apache.avro.file.ZstandardCodec.compress(ZstandardCodec.java:73)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1675396680

   I will update this PR so that it can be merged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1432166680

   @nastra, thank you for your reviews. I have addressed all your feedback. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1105656897


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +165,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  static final Map<String, String> CODEC_METADATA_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put("uncompressed", "null")

Review Comment:
   nit: as mentioned on https://github.com/apache/iceberg/pull/5893 I still think this should use the enums from `AvroCodec` as it's not clear that those are actually avro codecs. I'm curious what others think here



##########
core/src/main/java/org/apache/iceberg/ManifestListWriter.java:
##########
@@ -31,14 +31,18 @@
 abstract class ManifestListWriter implements FileAppender<ManifestFile> {
   private final FileAppender<ManifestFile> writer;
 
-  private ManifestListWriter(OutputFile file, Map<String, String> meta) {
-    this.writer = newAppender(file, meta);
+  private ManifestListWriter(
+      OutputFile file,
+      Map<String, String> meta,
+      String compressionCodec,
+      Integer compressionLevel) {
+    this.writer = newAppender(file, meta, compressionCodec, compressionLevel);
   }
 
   protected abstract ManifestFile prepare(ManifestFile manifest);
 
   protected abstract FileAppender<ManifestFile> newAppender(
-      OutputFile file, Map<String, String> meta);
+      OutputFile file, Map<String, String> meta, String compressionCodec, Integer compressionLevel);

Review Comment:
   this is what's causing the RevAPI breaks. I think it would be better to have 
   ```
   protected FileAppender<ManifestFile> newAppender(
         OutputFile file,
         Map<String, String> meta,
         String compressionCodec,
         Integer compressionLevel) {
       return newAppender(file, meta, null, null);
     }
   ```
   which calls the original `newAppender(..)` method with either nulls or default values for compression codec & level



##########
versions.props:
##########
@@ -48,3 +48,4 @@ com.esotericsoftware:kryo = 4.0.2
 org.eclipse.jetty:* = 9.4.43.v20210629
 org.testcontainers:* = 1.17.5
 io.delta:delta-core_* = 2.2.0
+com.github.luben:zstd-jni = 1.5.2-3

Review Comment:
   looks like there's a newer version available ([v1.5.4-1](https://github.com/luben/zstd-jni/releases/tag/v1.5.4-1)), should we switch to that?
   



##########
core/src/test/java/org/apache/iceberg/TestManifestWriter.java:
##########
@@ -234,4 +251,24 @@ private DataFile newFile(long recordCount, StructLike partition) {
     }
     return builder.build();
   }
+
+  <F extends ContentFile<F>> void validateManifestCompressionCodec(
+      CheckedFunction<String, ManifestFile> createManifestFunc,
+      CheckedFunction<ManifestFile, ManifestReader<F>> manifestReaderFunc)
+      throws IOException {
+    for (Map.Entry<String, String> entry : CODEC_METADATA_MAPPING.entrySet()) {
+      String codec = entry.getKey();
+      String expectedCodecValue = entry.getValue();
+
+      ManifestFile manifest = createManifestFunc.apply(codec);
+
+      try (ManifestReader<F> reader = manifestReaderFunc.apply(manifest)) {
+        Map<String, String> metadata = reader.metadata();
+        Assert.assertEquals(

Review Comment:
   can be simplified to `Assertions.assertThat(reader.getMetadata()).containsEntry(AVRO_CODEC_KEY, expectedCodecValue);`



##########
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java:
##########
@@ -0,0 +1,80 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.iceberg.avro.AvroIterable;
+import org.apache.iceberg.io.InputFile;
+import org.junit.Assert;
+import org.junit.Assume;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestManifestListWriter extends TableTestBase {
+
+  @Parameterized.Parameters(name = "formatVersion = {0}")
+  public static Object[] parameters() {
+    return new Object[] {1, 2};
+  }
+
+  public TestManifestListWriter(int formatVersion) {
+    super(formatVersion);
+  }
+
+  @Test
+  public void testWriteManifestListWithCompression() throws IOException {
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          return writeManifestList(compressionCodec, manifest);
+        });
+  }
+
+  @Test
+  public void testWriteDeleteManifestListWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          ManifestFile deleteManifest =
+              writeDeleteManifest(formatVersion, SNAPSHOT_ID, compressionCodec, FILE_A_DELETES);
+          return writeManifestList(compressionCodec, manifest, deleteManifest);
+        });
+  }
+
+  void validateManifestListCompressionCodec(
+      CheckedFunction<String, InputFile> createManifestListFunc) throws IOException {
+    for (Map.Entry<String, String> entry : CODEC_METADATA_MAPPING.entrySet()) {
+      String codec = entry.getKey();
+      String expectedCodecValue = entry.getValue();
+
+      InputFile manifestList = createManifestListFunc.apply(codec);
+      try (AvroIterable<ManifestFile> reader = ManifestLists.manifestFileIterable(manifestList)) {
+        Map<String, String> metadata = reader.getMetadata();
+        Assert.assertEquals(

Review Comment:
   can be simplified to `Assertions.assertThat(reader.getMetadata()).containsEntry(AVRO_CODEC_KEY, expectedCodecValue);`



##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   rather than having to break the API I think it would be better to define a new `newAppender(..)` method that by default calls the original one with null/default params. See also my comment further below.
   
   Once this is done, you shouldn't have any diff on `revapi.yml` anymore.



##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +165,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  static final Map<String, String> CODEC_METADATA_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put("uncompressed", "null")
+          .put("zstd", "zstandard")
+          .put("gzip", "deflate")
+          .build();
+
+  static final String AVRO_CODEC_KEY = "avro.codec";
+
+  static final long SNAPSHOT_ID = 987134631982734L;
+
+  private static final long SEQUENCE_NUMBER = 34L;

Review Comment:
   agreed, there doesn't seem too much value in having this in a constant field



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106573809


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I think you have it backwards, the original `newAppender` has 2 parameters, and we need a new `newAppender` with 4 parameters. The `newAppender` with 2 parameters can call the one with 4 parameters, passing null for the additional parameters, and we can make this the default for the `newAppender` with 2 parameters, but then it is no longer abstract, and we still have the same revapi breakage. (We get a "java.method.numberOfParametersChanged" for the abstract `newAppender` method.)
   I had considered and tried this, and rejected it for that reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1705099280

   sorry for the delay, I'm OOO for two weeks and will get back to this in a week (unless somebody else gets to it earlier)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331087821


##########
spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/actions/RewriteManifestsSparkAction.java:
##########
@@ -240,7 +240,10 @@ private List<ManifestFile> writeManifestsForUnpartitionedTable(
                 formatVersion,
                 combinedPartitionType,
                 spec,
-                sparkType),
+                sparkType,
+                table.properties().get(TableProperties.AVRO_COMPRESSION),
+                PropertyUtil.propertyAsNullableInt(
+                    table.properties(), TableProperties.AVRO_COMPRESSION_LEVEL)),

Review Comment:
   do we need to do the same for Spark 3.5?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332249764


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -229,18 +252,34 @@ protected ManifestEntry<DataFile> prepare(ManifestEntry<DataFile> entry) {
     @Override
     protected FileAppender<ManifestEntry<DataFile>> newAppender(
         PartitionSpec spec, OutputFile file) {
+      return newAppender(spec, file, null, null);
+    }
+
+    @Override
+    protected FileAppender<ManifestEntry<DataFile>> newAppender(
+        PartitionSpec spec, OutputFile file, String compressionCodec, Integer compressionLevel) {
       Schema manifestSchema = V2Metadata.entrySchema(spec.partitionType());
       try {
-        return Avro.write(file)
-            .schema(manifestSchema)
-            .named("manifest_entry")
-            .meta("schema", SchemaParser.toJson(spec.schema()))
-            .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
-            .meta("partition-spec-id", String.valueOf(spec.specId()))
-            .meta("format-version", "2")
-            .meta("content", "data")
-            .overwrite()
-            .build();
+        Avro.WriteBuilder builder =
+            Avro.write(file)
+                .schema(manifestSchema)
+                .named("manifest_entry")
+                .meta("schema", SchemaParser.toJson(spec.schema()))
+                .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
+                .meta("partition-spec-id", String.valueOf(spec.specId()))
+                .meta("format-version", "2")
+                .meta("content", "data")
+                .overwrite();
+
+        if (compressionCodec != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION, compressionCodec);
+        }
+
+        if (compressionLevel != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION_LEVEL, compressionLevel.toString());

Review Comment:
   In the original PR (https://github.com/apache/iceberg/pull/5893), @sumeetgajjar used a String for compressionLevel  in the API initially, and @rdblue commented that compression level is an int, not a string. Sumeet then changed it to an Integer, to allow for the fact that some codecs do not have compression level (so it can be null).
   I think that conceptually, it makes sense for compressionLevel to be an Integer in the API. In the implementation, it so happens that we call `Avro.WriteBuilder` and that makes us use a `set(String, String)` method to configure it, and ironically `Avro.WriteBuilder.Context` has to convert the String for compression level back into an int. But that is all implementation detail and an accident of the implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332350720


##########
core/src/test/java/org/apache/iceberg/TestManifestWriter.java:
##########
@@ -384,6 +386,17 @@ private void checkManifests(
     }
   }
 
+  @Test
+  public void testWriteManifestWithCompression() throws IOException {
+    validateManifestCompressionCodec(false);
+  }
+
+  @Test
+  public void testWriteDeleteManifestWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);

Review Comment:
   Ok. There are other existing tests in this class that use Assume. I'll change them too in that case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1724683152

   @nastra I hope you are refreshed from your time off, and can revisit this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332298345


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -52,20 +52,38 @@ public abstract class ManifestWriter<F extends ContentFile<F>> implements FileAp
   private long deletedRows = 0L;
   private Long minDataSequenceNumber = null;
 
-  private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
+  private ManifestWriter(
+      PartitionSpec spec,
+      OutputFile file,
+      Long snapshotId,
+      String compressionCodec,
+      Integer compressionLevel) {
     this.file = file;
     this.specId = spec.specId();
-    this.writer = newAppender(spec, file);
+    this.writer = newAppender(spec, file, compressionCodec, compressionLevel);
     this.snapshotId = snapshotId;
     this.reused = new GenericManifestEntry<>(spec.partitionType());
     this.stats = new PartitionSummary(spec);
   }
 
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
+  /**
+   * @deprecated since 1.4.0, will be removed in 1.5.0; use {@link
+   *     ManifestWriter#newAppender(PartitionSpec, OutputFile, String, Integer)} instead.
+   */
+  @Deprecated
   protected abstract FileAppender<ManifestEntry<F>> newAppender(
       PartitionSpec spec, OutputFile outputFile);
 
+  protected FileAppender<ManifestEntry<F>> newAppender(

Review Comment:
   I'm not sure I understand your comment. I'm keeping the existing method abstract so as not to break the API. This one has an implementation, so it cannot be abstract.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332523816


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -236,12 +254,22 @@ ManifestFile writeManifest(DataFile... files) throws IOException {
   }
 
   ManifestFile writeManifest(Long snapshotId, DataFile... files) throws IOException {
-    File manifestFile = temp.newFile("input.m0.avro");
+    return writeManifest(snapshotId, null, files);
+  }
+
+  ManifestFile writeManifest(Long snapshotId, String compressionCodec, DataFile... files)
+      throws IOException {
+    File manifestFile = temp.newFile();

Review Comment:
   Both the `InMemoryOutputFile` and `FileBasedOutputFile` should work as expected. I think the `InMemoryOutputFile` usage is aimed at accelerating the testing. Hold on to your thoughts. It is not a great deal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1426085222

   I am picking up https://github.com/apache/iceberg/pull/5893 where @sumeetgajjar left off, as he is now pursuing other projects.
   @rdblue I believe Sumeet has addressed your feedback. Compression level is an Integer, to allow for it to be null (for codecs that do not use compression level), and I have used `PropertyUtil.propertyAsNullableInt` to turn the String into an Integer. Sumeet has removed unnecessary changes from tests and only added new tests.
   I have rebased Sumeet's changes on master, fixed some issues, and moved the Flink changes from 1.15 to 1.16 (the current default Flink version).
   This is ready for review.
   cc @nastra and @amogh-jahagirdar who reviewed the original PR as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1110460926


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I probably don't understand how revapi works.
   Thank you for the examples, but unfortunately they do not answer the question on my mind. In your example, in the first change, you deprecate a number of methods and annotate that a replacement method should be used. In those cases, the replacement method had already been added (by some earlier change). Then in the second change, the deprecated methods are removed.
   I completely understand deprecating a method in one release, and removing it in the following release. What I don't understand is how to add the replacement method, or how to add a new method.
   In your example, in the first change, either the replacement method has already been added by some earlier change, or you amend a new method to have a default implementation (e.g., throwing an `UnsupportedOperationException`). In the example, this occurs in interfaces. There is unfortunately no example of an abstract class. If I'm not mistaken, you can add a default implementation to a method in an interface, but you cannot add a default implementation to an abstract method in an abstract class, as the method is then no longer abstract.
   
   Let's say I do as you suggest and make the `newAppender` with 4 parameters non-abstract as follow:
   ```
     protected FileAppender<ManifestEntry<F>> newAppender(
         PartitionSpec spec,
         OutputFile outputFile,
         String compressionCodec,
         Integer compressionLevel) {
       return newAppender(spec, outputFile);
     }
   ```
   In the following release, I'd remove the deprecated old `newAppender` with 2 parameters (well and good), and I'd need to change the `newAppender` with 4 parameters to be abstract, since the default implementation no longer makes sense. Wouldn't that break revapi at that point?
   
   Anyway, I just want to get this change in, and I make the change you suggest, but it doesn't seem right to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111313732


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +167,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  // Mapping of Avro codec name used by Iceberg to name used by Avro (and appearing in Avro metadata
+  // under the key, avro.codec).
+  // In tests, we use the Iceberg name to specify the codec, and we verify the codec used by reading
+  // the Avro metadata and checking for the Avro name in avro.codec.
+  static final Map<String, String> AVRO_CODEC_NAME_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put(Avro.Codec.UNCOMPRESSED.name(), DataFileConstants.NULL_CODEC)
+          .put(Avro.Codec.ZSTD.name(), DataFileConstants.ZSTANDARD_CODEC)
+          .put(Avro.Codec.GZIP.name(), DataFileConstants.DEFLATE_CODEC)
+          .build();
+
+  static final long SNAPSHOT_ID = 987134631982734L;

Review Comment:
   How about a name that signals that this is an example value? Like `EXAMPLE_SNAPSHOT_ID`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111314095


##########
core/src/main/java/org/apache/iceberg/SnapshotProducer.java:
##########
@@ -226,7 +226,9 @@ public Snapshot apply() {
             manifestList,
             snapshotId(),
             parentSnapshotId,
-            sequenceNumber)) {
+            sequenceNumber,
+            ops.current().properties().get(TableProperties.AVRO_COMPRESSION),
+            ops.current().propertyAsNullableInt(TableProperties.AVRO_COMPRESSION_LEVEL))) {

Review Comment:
   This shouldn't refer to `ops`. Instead, use `base`. We want to avoid referencing `ops` directly even if it is available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111315383


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -733,4 +803,9 @@ void assertEquals(String context, Object expected, Object actual) {
   protected interface Action {
     void invoke();
   }
+
+  @FunctionalInterface
+  interface CheckedFunction<T, R> {

Review Comment:
   This doesn't seem necessary. Can you just use `Function` and swallow the IOException?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1110460926


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I probably don't understand how revapi works.
   Thank you for the examples, but unfortunately they do not answer the question on my mind. In your example, in the first change, you deprecate a number of methods and annotate that a replacement method should be used. In those cases, **the replacement method had already been added (by some earlier change)**. Then in the second change, the deprecated methods are removed.
   I completely understand deprecating a method in one release, and removing it in the following release. What I don't understand is how to add the replacement method, or how to add a new method.
   In your example, in the first change, either the replacement method has already been added by some earlier change, or you amend a new method to have a default implementation (e.g., throwing an `UnsupportedOperationException`). In the example, this occurs in interfaces. There is unfortunately no example of an abstract class. If I'm not mistaken, you can add a default implementation to a method in an interface, but you cannot add a default implementation to an abstract method in an abstract class, as the method is then no longer abstract.
   
   Let's say I do as you suggest and make the `newAppender` with 4 parameters non-abstract as follow:
   ```
     protected FileAppender<ManifestEntry<F>> newAppender(
         PartitionSpec spec,
         OutputFile outputFile,
         String compressionCodec,
         Integer compressionLevel) {
       return newAppender(spec, outputFile);
     }
   ```
   In the following release, I'd remove the deprecated old `newAppender` with 2 parameters (well and good), and I'd need to change the `newAppender` with 4 parameters to be abstract, since the default implementation no longer makes sense. Wouldn't that break revapi at that point?
   
   Anyway, I just want to get this change in, and I'll make the change you suggest, but it doesn't seem right to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116503029


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -307,6 +353,30 @@ ManifestFile writeDeleteManifest(int newFormatVersion, Long snapshotId, DeleteFi
     return writer.toManifestFile();
   }
 
+  InputFile writeManifestList(String compressionCodec, ManifestFile... manifestFiles)
+      throws IOException {
+    File manifestListFile = temp.newFile();
+    Assert.assertTrue(manifestListFile.delete());
+    OutputFile outputFile =
+        org.apache.iceberg.Files.localOutput(
+            FileFormat.AVRO.addExtension(manifestListFile.toString()));
+
+    try (FileAppender<ManifestFile> writer =
+        ManifestLists.write(
+            formatVersion,
+            outputFile,
+            SNAPSHOT_ID,
+            SNAPSHOT_ID - 1,
+            formatVersion > 1 ? 34L : 0,
+            compressionCodec,
+            /* compressionLevel */ null)) {
+      for (ManifestFile manifestFile : manifestFiles) {
+        writer.add(manifestFile);
+      }
+    }
+    return outputFile.toInputFile();

Review Comment:
   Added a blank line.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1833125326

   Hmm, I think TestExpireSnapshotsAction > dataFilesCleanupWithParallelTasks might be a flaky test?
   All Spark 3 tests passed with Java 8 and 11, and even for Java 17, they passed for Scala 2.13, so it seems unlikely that there is a problem just with Scala 2.12 on Java 17.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332350444


##########
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java:
##########
@@ -0,0 +1,75 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.avro.file.DataFileConstants;
+import org.apache.iceberg.avro.AvroIterable;
+import org.apache.iceberg.io.InputFile;
+import org.assertj.core.api.Assertions;
+import org.junit.Assume;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestManifestListWriter extends TableTestBase {
+
+  @Parameterized.Parameters(name = "formatVersion = {0}")
+  public static Object[] parameters() {
+    return new Object[] {1, 2};
+  }
+
+  public TestManifestListWriter(int formatVersion) {
+    super(formatVersion);
+  }
+
+  @Test
+  public void testWriteManifestListWithCompression() throws IOException {
+    validateManifestListCompressionCodec(false);
+  }
+
+  @Test
+  public void testWriteDeleteManifestListWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);

Review Comment:
   Ok, will use Assumptions.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331436496


##########
core/src/main/java/org/apache/iceberg/TableMetadata.java:
##########
@@ -472,6 +472,10 @@ public int propertyAsInt(String property, int defaultValue) {
     return PropertyUtil.propertyAsInt(properties, property, defaultValue);
   }
 
+  public Integer propertyAsNullableInt(String property) {

Review Comment:
   This is not needed if we pass the `compression level` as a string.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331088907


##########
core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java:
##########
@@ -170,7 +170,9 @@ private ManifestFile copyManifest(ManifestFile manifest) {
         specsById,
         newFile,
         snapshotId(),
-        summaryBuilder);
+        summaryBuilder,
+        current.properties().get(TableProperties.AVRO_COMPRESSION),

Review Comment:
   ah sorry I didn't see that this came up earlier. Then let's go with what @rdblue suggested



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332358837


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -236,12 +254,22 @@ ManifestFile writeManifest(DataFile... files) throws IOException {
   }
 
   ManifestFile writeManifest(Long snapshotId, DataFile... files) throws IOException {
-    File manifestFile = temp.newFile("input.m0.avro");
+    return writeManifest(snapshotId, null, files);
+  }
+
+  ManifestFile writeManifest(Long snapshotId, String compressionCodec, DataFile... files)
+      throws IOException {
+    File manifestFile = temp.newFile();

Review Comment:
   Firstly, `TestManifestListVersions` is about testing versions (v1 versus v2). Secondly, `TestManifestListVersions` has a `writeManifestList` method (which uses an `InMemoryOutputFile` as you observed), but it doesn't have any `writeManifest` method. So there is nothing in `TestManifestListVersions` to reuse for this, and the tests I'm adding around compression in manifests and manifest lists do not fit the purpose of `TestManifestListVersions`.
   
   I don't think it's an issue that we're actually writing to disk, is it? There are other test classes calling `writeManifest` (not just `TestManifestWriter` and `TestManifestListWriter`) and I didn't check if and how they might be affected if we change this to using `InMemoryOutputFile`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1372337825


##########
core/src/main/java/org/apache/iceberg/ManifestFiles.java:
##########
@@ -157,11 +157,34 @@ public static ManifestWriter<DataFile> write(PartitionSpec spec, OutputFile outp
    */
   public static ManifestWriter<DataFile> write(
       int formatVersion, PartitionSpec spec, OutputFile outputFile, Long snapshotId) {
+    return write(formatVersion, spec, outputFile, snapshotId, null, null);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter} for the given format version.
+   *
+   * @param formatVersion a target format version
+   * @param spec a {@link PartitionSpec}
+   * @param outputFile an {@link OutputFile} where the manifest will be written
+   * @param snapshotId a snapshot ID for the manifest entries, or null for an inherited ID
+   * @param compressionCodec compression codec for the manifest file
+   * @param compressionLevel compression level of the compressionCodec
+   * @return a manifest writer
+   */
+  public static ManifestWriter<DataFile> write(

Review Comment:
   I thought I already commented before going on vacation but can't seem to find the old discussion. Sorry if I post the same question again. Have we considered using a builder? My worry with the current approach was that we need to offer an overloaded method every time we add a new parameter. 
   
   @wypoon @nastra @rdblue?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106573809


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I think you have it backwards, the original `newAppender` has 2 parameters, and we need a new `newAppender` with 4 parameters. The `newAppender` with 2 parameters can call the one with 4 parameters, passing null for the additional parameters, and we can make this the default for the `newAppender` with 2 parameters, but then we get a different revapi breakage.
   I had considered and tried this, and rejected it for that reason.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106882841


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I think we can still do it without breaking the API. I created https://github.com/apache/iceberg/commit/667aacc59ed5d8b70baf94c6c290f7538d00f790 in my fork that outlines it. 
   
   Basically we need to go through one deprecation cycle (one minor release) as indicated in https://github.com/apache/iceberg/blob/master/CONTRIBUTING.md#minor-version-deprecations-required before we can break the API here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108930721


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I looked at some of the historical justifications in the revapi.yml. I see, e.g., a number of
   ```
         justification: "Allow adding a new method to the interface - old method is deprecated"
   ```
   I think that is the way, isn't it? Deprecate the old abstract `newAppender` method (with 2 parameters), add a new abstract `newAppender` method (with 4 parameters), and justify it as
   ```
         justification: "Allow adding a new method to the abstract class - old method is deprecated"
   ```
   Otherwise, how would you ever change the API? Am I missing something?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116245264


##########
core/src/main/java/org/apache/iceberg/avro/Avro.java:
##########
@@ -69,7 +69,7 @@
 public class Avro {
   private Avro() {}
 
-  private enum Codec {
+  public enum Codec {

Review Comment:
   I'm afraid I don't understand what you're suggesting.
   The Map in `TestTableBase` that is now named `AVRO_CODEC_NAME_MAPPING` originally used plain Strings, and @nastra advocated using `Avro.Codec` values, but in order to do that, I had to make the enum public. To be clear, `AVRO_CODEC_NAME_MAPPING` is still a `Map<String, String>`:
   ```
     static final Map<String, String> AVRO_CODEC_NAME_MAPPING =
         ImmutableMap.<String, String>builder()
             .put(Avro.Codec.UNCOMPRESSED.name(), DataFileConstants.NULL_CODEC)
             .put(Avro.Codec.ZSTD.name(), DataFileConstants.ZSTANDARD_CODEC)
             .put(Avro.Codec.GZIP.name(), DataFileConstants.DEFLATE_CODEC)
             .build();
   ```
   Referencing the `Avro.Codec` values is simply to make clear that those Strings are codec names.
   I'm fine with using plain Strings instead. Is that what you're suggesting?
   
   I don't understand your comment about adding a `fromName` method. Do you mean to add a (static) method to `Avro.Codec` that takes a String and returns the enum? How would I be able to use it, if `Avro.Codec` remains private? I can't call it from outside `Avro`, because the enum is private, and I can't use the return value, for the same reason.
   
   I do agree that `Avro.Codec` should be private, because the only use of it is for switching on the values within a private static method in `Avro`, to return the appropriate `org.apache.avro.file.CodecFactory`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111313960


##########
core/src/main/java/org/apache/iceberg/avro/Avro.java:
##########
@@ -69,7 +69,7 @@
 public class Avro {
   private Avro() {}
 
-  private enum Codec {
+  public enum Codec {

Review Comment:
   Rather than exposing this, can we just use a String instead?
   
   When we do that, we typically also add a `fromName` method to the internal enums.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1698276764

   Rebased on master again to resolve merge conflicts with gradle/libs.versions.toml.
   @aokolnychyi would you take a look as well?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1303600278


##########
core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java:
##########
@@ -170,7 +170,9 @@ private ManifestFile copyManifest(ManifestFile manifest) {
         specsById,
         newFile,
         snapshotId(),
-        summaryBuilder);
+        summaryBuilder,
+        current.properties().get(TableProperties.AVRO_COMPRESSION),

Review Comment:
   Funnily enough, this is what @sumeetgajjar did initially (see https://github.com/apache/iceberg/pull/6799/commits/62cd446ffc99d83062e2082acb5a26cdfa517c00).
   However, he changed it after the initial feedback (see https://github.com/apache/iceberg/pull/6799/commits/7ab1d5c26bddca0e3540b12eeae48a5be3c2cd84).
   The feedback that led him to do this is from @rdblue: https://github.com/apache/iceberg/pull/5893#issuecomment-1267261389 (see also https://github.com/apache/iceberg/pull/5893#discussion_r987094412).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1305105988


##########
core/src/main/java/org/apache/iceberg/BaseRewriteManifests.java:
##########
@@ -170,7 +170,9 @@ private ManifestFile copyManifest(ManifestFile manifest) {
         specsById,
         newFile,
         snapshotId(),
-        summaryBuilder);
+        summaryBuilder,
+        current.properties().get(TableProperties.AVRO_COMPRESSION),

Review Comment:
   @nastra please let me know what you think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331480148


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -236,12 +254,22 @@ ManifestFile writeManifest(DataFile... files) throws IOException {
   }
 
   ManifestFile writeManifest(Long snapshotId, DataFile... files) throws IOException {
-    File manifestFile = temp.newFile("input.m0.avro");
+    return writeManifest(snapshotId, null, files);
+  }
+
+  ManifestFile writeManifest(Long snapshotId, String compressionCodec, DataFile... files)
+      throws IOException {
+    File manifestFile = temp.newFile();

Review Comment:
   It uses `InMemoryOutputFile` in `TestManifestListVersions`. Or maybe we could put those UTs to `TestManifestListVersions` to reuse those test util methods.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332517161


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -52,20 +52,38 @@ public abstract class ManifestWriter<F extends ContentFile<F>> implements FileAp
   private long deletedRows = 0L;
   private Long minDataSequenceNumber = null;
 
-  private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
+  private ManifestWriter(
+      PartitionSpec spec,
+      OutputFile file,
+      Long snapshotId,
+      String compressionCodec,
+      Integer compressionLevel) {
     this.file = file;
     this.specId = spec.specId();
-    this.writer = newAppender(spec, file);
+    this.writer = newAppender(spec, file, compressionCodec, compressionLevel);
     this.snapshotId = snapshotId;
     this.reused = new GenericManifestEntry<>(spec.partitionType());
     this.stats = new PartitionSummary(spec);
   }
 
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
+  /**
+   * @deprecated since 1.4.0, will be removed in 1.5.0; use {@link
+   *     ManifestWriter#newAppender(PartitionSpec, OutputFile, String, Integer)} instead.
+   */
+  @Deprecated
   protected abstract FileAppender<ManifestEntry<F>> newAppender(
       PartitionSpec spec, OutputFile outputFile);
 
+  protected FileAppender<ManifestEntry<F>> newAppender(

Review Comment:
   The `newAppender(PartitionSpec spec, OutputFile outputFile);` is marked as deprecated. This method still needs to be modified to `abstract` after the `newAppender(PartitionSpec spec, OutputFile outputFile)` is removed in 1.5.0. However, it will break the API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1375148065


##########
core/src/main/java/org/apache/iceberg/ManifestFiles.java:
##########
@@ -157,11 +157,34 @@ public static ManifestWriter<DataFile> write(PartitionSpec spec, OutputFile outp
    */
   public static ManifestWriter<DataFile> write(
       int formatVersion, PartitionSpec spec, OutputFile outputFile, Long snapshotId) {
+    return write(formatVersion, spec, outputFile, snapshotId, null, null);
+  }
+
+  /**
+   * Create a new {@link ManifestWriter} for the given format version.
+   *
+   * @param formatVersion a target format version
+   * @param spec a {@link PartitionSpec}
+   * @param outputFile an {@link OutputFile} where the manifest will be written
+   * @param snapshotId a snapshot ID for the manifest entries, or null for an inherited ID
+   * @param compressionCodec compression codec for the manifest file
+   * @param compressionLevel compression level of the compressionCodec
+   * @return a manifest writer
+   */
+  public static ManifestWriter<DataFile> write(

Review Comment:
   @sumeetgajjar originally used a `Map` parameter for this , and @rdblue commented that "we don't want to pass a map of properties around. That's exposing too much where it doesn't need to be, and people tend to misuse generic arguments like this."
   What I propose to do then is to introduce a `ManifestWriter.Options` class and use that here (instead of a `Map`). I'll also introduce a `ManifestListWriter.Options` class and use that in `ManifestLists.write`. These `Options` classes define what additional parameters are applicable and may be set. If in future, additional parameters are needed, they can be added to these `Options` classes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1833208488

   @nastra @aokolnychyi I have rebased on main and resolved the conflicts with the `RewriteManifestsSparkAction` refactoring. As I mentioned before, I have introduced `ManifestWriter.Options` and `ManifestListWriter.Options` to be passed to `ManifestFiles.write` and `ManifestLists.write`. These `Options` classes provide defined options (currently just compression codec and level but extensible in future) that may be passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1103188491


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +165,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  static final Map<String, String> CODEC_METADATA_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put("uncompressed", "null")
+          .put("zstd", "zstandard")
+          .put("gzip", "deflate")
+          .build();
+
+  static final String AVRO_CODEC_KEY = "avro.codec";
+
+  static final long SNAPSHOT_ID = 987134631982734L;
+
+  private static final long SEQUENCE_NUMBER = 34L;

Review Comment:
   I would be fine with simply using the long value instead of defining this constant, as it is only used once.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106574636


##########
core/src/main/java/org/apache/iceberg/ManifestListWriter.java:
##########
@@ -31,14 +31,18 @@
 abstract class ManifestListWriter implements FileAppender<ManifestFile> {
   private final FileAppender<ManifestFile> writer;
 
-  private ManifestListWriter(OutputFile file, Map<String, String> meta) {
-    this.writer = newAppender(file, meta);
+  private ManifestListWriter(
+      OutputFile file,
+      Map<String, String> meta,
+      String compressionCodec,
+      Integer compressionLevel) {
+    this.writer = newAppender(file, meta, compressionCodec, compressionLevel);
   }
 
   protected abstract ManifestFile prepare(ManifestFile manifest);
 
   protected abstract FileAppender<ManifestFile> newAppender(
-      OutputFile file, Map<String, String> meta);
+      OutputFile file, Map<String, String> meta, String compressionCodec, Integer compressionLevel);

Review Comment:
   This is not what's causing the revapi breakage; it is the `newAppender` in `ManifestWriter`.
   `ManifestListWriter` is package private, not public, so the change is not an API break.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111314583


##########
core/src/main/java/org/apache/iceberg/ManifestLists.java:
##########
@@ -21,51 +21,79 @@
 import java.io.IOException;
 import java.util.List;
 import org.apache.iceberg.avro.Avro;
+import org.apache.iceberg.avro.AvroIterable;
 import org.apache.iceberg.exceptions.RuntimeIOException;
 import org.apache.iceberg.io.CloseableIterable;
 import org.apache.iceberg.io.InputFile;
 import org.apache.iceberg.io.OutputFile;
+import org.apache.iceberg.relocated.com.google.common.annotations.VisibleForTesting;
 import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 class ManifestLists {
   private ManifestLists() {}
 
   static List<ManifestFile> read(InputFile manifestList) {
-    try (CloseableIterable<ManifestFile> files =
-        Avro.read(manifestList)
-            .rename("manifest_file", GenericManifestFile.class.getName())
-            .rename("partitions", GenericPartitionFieldSummary.class.getName())
-            .rename("r508", GenericPartitionFieldSummary.class.getName())
-            .classLoader(GenericManifestFile.class.getClassLoader())
-            .project(ManifestFile.schema())
-            .reuseContainers(false)
-            .build()) {
-
+    try (CloseableIterable<ManifestFile> files = manifestFileIterable(manifestList)) {
       return Lists.newLinkedList(files);
-

Review Comment:
   Unnecessary whitespace change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116506860


##########
core/src/test/java/org/apache/iceberg/TestManifestListWriter.java:
##########
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.iceberg;
+
+import java.io.IOException;
+import java.util.Map;
+import org.apache.avro.file.DataFileConstants;
+import org.apache.iceberg.avro.AvroIterable;
+import org.apache.iceberg.io.InputFile;
+import org.assertj.core.api.Assertions;
+import org.junit.Assume;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+
+@RunWith(Parameterized.class)
+public class TestManifestListWriter extends TableTestBase {
+
+  @Parameterized.Parameters(name = "formatVersion = {0}")
+  public static Object[] parameters() {
+    return new Object[] {1, 2};
+  }
+
+  public TestManifestListWriter(int formatVersion) {
+    super(formatVersion);
+  }
+
+  @Test
+  public void testWriteManifestListWithCompression() throws IOException {
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          return writeManifestList(compressionCodec, manifest);
+        });
+  }
+
+  @Test
+  public void testWriteDeleteManifestListWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);
+    validateManifestListCompressionCodec(
+        compressionCodec -> {
+          ManifestFile manifest = writeManifest(SNAPSHOT_ID, compressionCodec, FILE_A);
+          ManifestFile deleteManifest =
+              writeDeleteManifest(formatVersion, SNAPSHOT_ID, compressionCodec, FILE_A_DELETES);
+          return writeManifestList(compressionCodec, manifest, deleteManifest);
+        });
+  }
+
+  void validateManifestListCompressionCodec(
+      CheckedFunction<String, InputFile> createManifestListFunc) throws IOException {

Review Comment:
   I changed the validate methods in `TestManifestWriter` and `TestManifestListWriter` so that they no longer took functions. Now they take a boolean flag that determines how to write the manifests/manifest lists.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116504911


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -733,4 +803,9 @@ void assertEquals(String context, Object expected, Object actual) {
   protected interface Action {
     void invoke();
   }
+
+  @FunctionalInterface
+  interface CheckedFunction<T, R> {

Review Comment:
   Removed, as I refactored the methods that used it, so it's no longer needed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116192787


##########
versions.props:
##########
@@ -48,3 +48,4 @@ com.esotericsoftware:kryo = 4.0.2
 org.eclipse.jetty:* = 9.4.43.v20210629
 org.testcontainers:* = 1.17.5
 io.delta:delta-core_* = 2.2.0
+com.github.luben:zstd-jni = 1.5.4-1

Review Comment:
   However, this dependency is only needed at test runtime, so I'll change the dependency from `testImplementation` to `testRuntimeOnly`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1442280977

   @rdblue thank you for reviewing. I have just been on a short vacation. I'll look into your comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1460748844

   @rdblue can you please review this again? @nastra has approved it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332518184


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -229,18 +252,34 @@ protected ManifestEntry<DataFile> prepare(ManifestEntry<DataFile> entry) {
     @Override
     protected FileAppender<ManifestEntry<DataFile>> newAppender(
         PartitionSpec spec, OutputFile file) {
+      return newAppender(spec, file, null, null);
+    }
+
+    @Override
+    protected FileAppender<ManifestEntry<DataFile>> newAppender(
+        PartitionSpec spec, OutputFile file, String compressionCodec, Integer compressionLevel) {
       Schema manifestSchema = V2Metadata.entrySchema(spec.partitionType());
       try {
-        return Avro.write(file)
-            .schema(manifestSchema)
-            .named("manifest_entry")
-            .meta("schema", SchemaParser.toJson(spec.schema()))
-            .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
-            .meta("partition-spec-id", String.valueOf(spec.specId()))
-            .meta("format-version", "2")
-            .meta("content", "data")
-            .overwrite()
-            .build();
+        Avro.WriteBuilder builder =
+            Avro.write(file)
+                .schema(manifestSchema)
+                .named("manifest_entry")
+                .meta("schema", SchemaParser.toJson(spec.schema()))
+                .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
+                .meta("partition-spec-id", String.valueOf(spec.specId()))
+                .meta("format-version", "2")
+                .meta("content", "data")
+                .overwrite();
+
+        if (compressionCodec != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION, compressionCodec);
+        }
+
+        if (compressionLevel != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION_LEVEL, compressionLevel.toString());

Review Comment:
   Thanks for the background.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1333556985


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -52,20 +52,38 @@ public abstract class ManifestWriter<F extends ContentFile<F>> implements FileAp
   private long deletedRows = 0L;
   private Long minDataSequenceNumber = null;
 
-  private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
+  private ManifestWriter(
+      PartitionSpec spec,
+      OutputFile file,
+      Long snapshotId,
+      String compressionCodec,
+      Integer compressionLevel) {
     this.file = file;
     this.specId = spec.specId();
-    this.writer = newAppender(spec, file);
+    this.writer = newAppender(spec, file, compressionCodec, compressionLevel);
     this.snapshotId = snapshotId;
     this.reused = new GenericManifestEntry<>(spec.partitionType());
     this.stats = new PartitionSummary(spec);
   }
 
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
+  /**
+   * @deprecated since 1.4.0, will be removed in 1.5.0; use {@link
+   *     ManifestWriter#newAppender(PartitionSpec, OutputFile, String, Integer)} instead.
+   */
+  @Deprecated
   protected abstract FileAppender<ManifestEntry<F>> newAppender(
       PartitionSpec spec, OutputFile outputFile);
 
+  protected FileAppender<ManifestEntry<F>> newAppender(

Review Comment:
   @ConeyLiu I see your point.
   @nastra recommended making the new `newAppender` with 4 parameters non-abstract (calling the old `newAppender` with 2 parameters) to avoid breaking the API. However, as you point out, when we remove the deprecated 2-parameter `newAppender` in a future release, this 4-parameter `newAppender` will have to become abstract, which will break the API then. We either break the API now or later. @nastra what do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332295801


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -229,18 +252,34 @@ protected ManifestEntry<DataFile> prepare(ManifestEntry<DataFile> entry) {
     @Override
     protected FileAppender<ManifestEntry<DataFile>> newAppender(
         PartitionSpec spec, OutputFile file) {
+      return newAppender(spec, file, null, null);
+    }
+
+    @Override
+    protected FileAppender<ManifestEntry<DataFile>> newAppender(
+        PartitionSpec spec, OutputFile file, String compressionCodec, Integer compressionLevel) {
       Schema manifestSchema = V2Metadata.entrySchema(spec.partitionType());
       try {
-        return Avro.write(file)
-            .schema(manifestSchema)
-            .named("manifest_entry")
-            .meta("schema", SchemaParser.toJson(spec.schema()))
-            .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
-            .meta("partition-spec-id", String.valueOf(spec.specId()))
-            .meta("format-version", "2")
-            .meta("content", "data")
-            .overwrite()
-            .build();
+        Avro.WriteBuilder builder =
+            Avro.write(file)
+                .schema(manifestSchema)
+                .named("manifest_entry")
+                .meta("schema", SchemaParser.toJson(spec.schema()))
+                .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
+                .meta("partition-spec-id", String.valueOf(spec.specId()))
+                .meta("format-version", "2")
+                .meta("content", "data")
+                .overwrite();
+
+        if (compressionCodec != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION, compressionCodec);
+        }
+
+        if (compressionLevel != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION_LEVEL, compressionLevel.toString());

Review Comment:
   On the other hand, I agree that making compressionLevel a String is more efficient due to how the implementation works. One can make an argument for either case, it just depends on what you prioritize.
   I don't have a strong opinion on this. For now, I'll keep this as it is.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1332481736


##########
core/src/main/java/org/apache/iceberg/avro/Avro.java:
##########
@@ -69,7 +69,7 @@
 public class Avro {
   private Avro() {}
 
-  private enum Codec {
+  public enum Codec {

Review Comment:
   Ok, I'll do so.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331435020


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -229,18 +252,34 @@ protected ManifestEntry<DataFile> prepare(ManifestEntry<DataFile> entry) {
     @Override
     protected FileAppender<ManifestEntry<DataFile>> newAppender(
         PartitionSpec spec, OutputFile file) {
+      return newAppender(spec, file, null, null);
+    }
+
+    @Override
+    protected FileAppender<ManifestEntry<DataFile>> newAppender(
+        PartitionSpec spec, OutputFile file, String compressionCodec, Integer compressionLevel) {
       Schema manifestSchema = V2Metadata.entrySchema(spec.partitionType());
       try {
-        return Avro.write(file)
-            .schema(manifestSchema)
-            .named("manifest_entry")
-            .meta("schema", SchemaParser.toJson(spec.schema()))
-            .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
-            .meta("partition-spec-id", String.valueOf(spec.specId()))
-            .meta("format-version", "2")
-            .meta("content", "data")
-            .overwrite()
-            .build();
+        Avro.WriteBuilder builder =
+            Avro.write(file)
+                .schema(manifestSchema)
+                .named("manifest_entry")
+                .meta("schema", SchemaParser.toJson(spec.schema()))
+                .meta("partition-spec", PartitionSpecParser.toJsonFields(spec))
+                .meta("partition-spec-id", String.valueOf(spec.specId()))
+                .meta("format-version", "2")
+                .meta("content", "data")
+                .overwrite();
+
+        if (compressionCodec != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION, compressionCodec);
+        }
+
+        if (compressionLevel != null) {
+          builder.set(TableProperties.AVRO_COMPRESSION_LEVEL, compressionLevel.toString());

Review Comment:
   Why not pass a string argument? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331085946


##########
core/src/test/java/org/apache/iceberg/TestManifestWriter.java:
##########
@@ -384,6 +386,17 @@ private void checkManifests(
     }
   }
 
+  @Test
+  public void testWriteManifestWithCompression() throws IOException {
+    validateManifestCompressionCodec(false);
+  }
+
+  @Test
+  public void testWriteDeleteManifestWithCompression() throws IOException {
+    Assume.assumeTrue("delete files are only written for format version > 1", formatVersion > 1);

Review Comment:
   same as above



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ConeyLiu commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "ConeyLiu (via GitHub)" <gi...@apache.org>.
ConeyLiu commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331432655


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -52,20 +52,38 @@ public abstract class ManifestWriter<F extends ContentFile<F>> implements FileAp
   private long deletedRows = 0L;
   private Long minDataSequenceNumber = null;
 
-  private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
+  private ManifestWriter(
+      PartitionSpec spec,
+      OutputFile file,
+      Long snapshotId,
+      String compressionCodec,
+      Integer compressionLevel) {
     this.file = file;
     this.specId = spec.specId();
-    this.writer = newAppender(spec, file);
+    this.writer = newAppender(spec, file, compressionCodec, compressionLevel);
     this.snapshotId = snapshotId;
     this.reused = new GenericManifestEntry<>(spec.partitionType());
     this.stats = new PartitionSummary(spec);
   }
 
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
+  /**
+   * @deprecated since 1.4.0, will be removed in 1.5.0; use {@link
+   *     ManifestWriter#newAppender(PartitionSpec, OutputFile, String, Integer)} instead.
+   */
+  @Deprecated
   protected abstract FileAppender<ManifestEntry<F>> newAppender(
       PartitionSpec spec, OutputFile outputFile);
 
+  protected FileAppender<ManifestEntry<F>> newAppender(

Review Comment:
   Should this be an abstract and use this as the default implementation for the above method?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1727085658

   @ConeyLiu since you had a similar PR, could you review this one as well please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1331095932


##########
core/src/main/java/org/apache/iceberg/avro/Avro.java:
##########
@@ -69,7 +69,7 @@
 public class Avro {
   private Avro() {}
 
-  private enum Codec {
+  public enum Codec {

Review Comment:
   in that case maybe let's make `Codec` private again and use plain strings in the test?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1732696046

   @nastra if you don't like https://github.com/apache/iceberg/pull/8617 can you please merge this instead? I have addressed all your feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108812573


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   I fail to see how that helps:
   ```
     protected FileAppender<ManifestEntry<F>> newAppender(
         PartitionSpec spec,
         OutputFile outputFile,
         String compressionCodec,
         Integer compressionLevel) {
       return newAppender(spec, outputFile);
     }
   ```
   We want to remove the `newAppender` with two parameters in the future release, at which time the above has to be changed. We really want to make it abstract. At that point, we will have a revapi breakage, wouldn't we? I fail to see how we can avoid the revapi breakage.
   @rdblue @danielcweeks ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1109665669


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   the example you mentioned is quite old and this was before we were stricter on API breakages. 
   As part of #6146 we went back and fixed API breakages by first deprecating them first. In #6274 we then went ahead and removed those deprecations.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1108824544


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -307,6 +353,30 @@ ManifestFile writeDeleteManifest(int newFormatVersion, Long snapshotId, DeleteFi
     return writer.toManifestFile();
   }
 
+  InputFile writeManifestList(String compressionCodec, ManifestFile... manifestFiles)
+      throws IOException {
+    File manifestListFile = temp.newFile();
+    Assert.assertTrue(manifestListFile.delete());
+    OutputFile outputFile =
+        org.apache.iceberg.Files.localOutput(
+            FileFormat.AVRO.addExtension(manifestListFile.toString()));
+
+    try (FileAppender<ManifestFile> writer =
+        ManifestLists.write(
+            formatVersion,
+            outputFile,
+            SNAPSHOT_ID,
+            SNAPSHOT_ID - 1,
+            formatVersion > 1 ? 34L : 0,

Review Comment:
   This was written by Sumeet and I don't know why he used 34L, but I guess so.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111314373


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -52,20 +52,38 @@
   private long deletedRows = 0L;
   private Long minDataSequenceNumber = null;
 
-  private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
+  private ManifestWriter(
+      PartitionSpec spec,
+      OutputFile file,
+      Long snapshotId,
+      String compressionCodec,
+      Integer compressionLevel) {
     this.file = file;
     this.specId = spec.specId();
-    this.writer = newAppender(spec, file);
+    this.writer = newAppender(spec, file, compressionCodec, compressionLevel);
     this.snapshotId = snapshotId;
     this.reused = new GenericManifestEntry<>(spec.partitionType());
     this.stats = new PartitionSummary(spec);
   }
 
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
+  /**
+   * @deprecated since 1.2.0, will be removed in 1.3.0; use {@link
+   *     ManifestWriter#newAppender(PartitionSpec, OutputFile, String, Integer)} instead.
+   */
+  @Deprecated

Review Comment:
   Why remove this? I don't think it makes sense to force people to set compression and level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1111315927


##########
versions.props:
##########
@@ -48,3 +48,4 @@ com.esotericsoftware:kryo = 4.0.2
 org.eclipse.jetty:* = 9.4.43.v20210629
 org.testcontainers:* = 1.17.5
 io.delta:delta-core_* = 2.2.0
+com.github.luben:zstd-jni = 1.5.4-1

Review Comment:
   Why is this new dependency required?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1106886325


##########
core/src/main/java/org/apache/iceberg/ManifestWriter.java:
##########
@@ -64,7 +69,7 @@ private ManifestWriter(PartitionSpec spec, OutputFile file, Long snapshotId) {
   protected abstract ManifestEntry<F> prepare(ManifestEntry<F> entry);
 
   protected abstract FileAppender<ManifestEntry<F>> newAppender(
-      PartitionSpec spec, OutputFile outputFile);
+      PartitionSpec spec, OutputFile outputFile, String compressionCodec, Integer compressionLevel);

Review Comment:
   I believe it should be as I outlined in https://github.com/apache/iceberg/commit/1b237d570197d52fc0c7d959abc28ff14830d9aa. After one minor release we can then clean up the API.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "nastra (via GitHub)" <gi...@apache.org>.
nastra commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1105640786


##########
.palantir/revapi.yml:
##########
@@ -261,6 +261,16 @@ acceptedBreaks:
         \ T) throws java.io.IOException, com.fasterxml.jackson.core.JacksonException\
         \ @ org.apache.iceberg.rest.RESTSerializers.UpdateRequirementDeserializer"
       justification: "False positive - JacksonException is a subclass of IOException"
+    - code: "java.method.numberOfParametersChanged"
+      old: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile)"
+      new: "method org.apache.iceberg.io.FileAppender<org.apache.iceberg.ManifestEntry<F>>\
+        \ org.apache.iceberg.ManifestWriter<F extends org.apache.iceberg.ContentFile<F\
+        \ extends org.apache.iceberg.ContentFile<F>>>::newAppender(org.apache.iceberg.PartitionSpec,\
+        \ org.apache.iceberg.io.OutputFile, java.lang.String, java.lang.Integer)"

Review Comment:
   rather than having to break the API I think it would be better to define a new `newAppender(..)` method that by default calls the original one. See also my comment further below.
   
   Once this is done, you shouldn't have any diff on `revapi.yml` anymore.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] wypoon commented on a diff in pull request #6799: Core: Use avro compression properties from table properties when writing manifests and manifest lists

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on code in PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#discussion_r1116302161


##########
core/src/test/java/org/apache/iceberg/TableTestBase.java:
##########
@@ -162,6 +167,19 @@ public class TableTestBase {
 
   static final FileIO FILE_IO = new TestTables.LocalFileIO();
 
+  // Mapping of Avro codec name used by Iceberg to name used by Avro (and appearing in Avro metadata
+  // under the key, avro.codec).
+  // In tests, we use the Iceberg name to specify the codec, and we verify the codec used by reading
+  // the Avro metadata and checking for the Avro name in avro.codec.
+  static final Map<String, String> AVRO_CODEC_NAME_MAPPING =
+      ImmutableMap.<String, String>builder()
+          .put(Avro.Codec.UNCOMPRESSED.name(), DataFileConstants.NULL_CODEC)
+          .put(Avro.Codec.ZSTD.name(), DataFileConstants.ZSTANDARD_CODEC)
+          .put(Avro.Codec.GZIP.name(), DataFileConstants.DEFLATE_CODEC)
+          .build();
+
+  static final long SNAPSHOT_ID = 987134631982734L;

Review Comment:
   Hmm, I do not see any tests with `DUMMY_SNAPSHOT_ID`. The only occurrence of `DUMMY_SNAPSHOT_ID` I found is as a private constant in `FlinkManifestUtil` for `0L`.
   I'll rename the `SNAPSHOT_ID` here to `EXAMPLE_SNAPSHOT_ID`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


Re: [PR] Core: Use avro compression properties from table properties when writing manifests and manifest lists [iceberg]

Posted by "wypoon (via GitHub)" <gi...@apache.org>.
wypoon commented on PR #6799:
URL: https://github.com/apache/iceberg/pull/6799#issuecomment-1785524764

   @nastra @aokolnychyi please let me know what you think.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org