You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/17 10:00:34 UTC

[GitHub] [iceberg] openinx opened a new pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

openinx opened a new pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343


   This PR is resolving this comment : https://github.com/apache/iceberg/pull/2258#discussion_r588809379


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r597898862



##########
File path: core/src/main/java/org/apache/iceberg/LazyImmutableMap.java
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class LazyImmutableMap<K, V> implements Map<K, V>, Serializable {
+
+  private final Map<K, V> copiedMap;
+  private Map<K, V> immutableMap;
+
+  LazyImmutableMap() {
+    this.copiedMap = Maps.newLinkedHashMap();
+  }
+
+  private LazyImmutableMap(Map<K, V> map) {
+    this.copiedMap = Maps.newLinkedHashMap();

Review comment:
       Why use `LinkedHashMap`? Do we need to preserve key order for some reason?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810461257


   Merged. Thanks for fixing this, @openinx!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r597900586



##########
File path: core/src/main/java/org/apache/iceberg/LazyImmutableMap.java
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class LazyImmutableMap<K, V> implements Map<K, V>, Serializable {

Review comment:
       This name doesn't quite work because this map isn't immutable. It can be used to get an immutable view of this map. Also, the `of` method copies the contents of the other map, so it would be better to name that `copyOf` instead.
   
   How about naming this class `SerializableMap` instead, since it is used to avoid problems with Kryo serialization?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dubeme commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
dubeme commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-974583421


   It seems this serialization bug persists when calling `add_files`... I created this issue showcasing my scenario #3566


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810685833


   I am afraid it does not solve our issues with Kryo. We still are going to fail if we get a non-serializable byte buffer.
   
   I think `SerializableByteBufferMap` fixes the problem for Java serialization but we don't really have a solution for Kryo. If we switch to `DirectByteBuffer` from `HeapByteBuffer`, the Kryo tests in this PR will fail.
   
   We probably have two options:
   - Require a custom serializer to be registered for non-serializable byte buffer implementations.
   - Modify `SerializableByteBufferMap` to store array bytes, which would mean eager conversion of byte buffers to arrays.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r597899331



##########
File path: core/src/main/java/org/apache/iceberg/BaseFile.java
##########
@@ -448,17 +444,16 @@ public String toString() {
         .add("partition", partitionData)
         .add("record_count", recordCount)
         .add("file_size_in_bytes", fileSizeInBytes)
-        .add("column_sizes", columnSizes)
-        .add("value_counts", valueCounts)
-        .add("null_value_counts", nullValueCounts)
-        .add("nan_value_counts", nanValueCounts)
-        .add("lower_bounds", lowerBounds)
-        .add("upper_bounds", upperBounds)
+        .add("column_sizes", toReadableMap(columnSizes))
+        .add("value_counts", toReadableMap(valueCounts))
+        .add("null_value_counts", toReadableMap(nullValueCounts))
+        .add("nan_value_counts", toReadableMap(nanValueCounts))
+        .add("lower_bounds", toReadableMap(lowerBounds))
+        .add("upper_bounds", toReadableMap(upperBounds))

Review comment:
       The original maps should work fine here. There's no risk of `add` modifying the map.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810686511


   Does my analysis sound right to you, @openinx @rdblue?
   
   cc others as well, @RussellSpitzer @yyanyy @jackye1995 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-807982656


   Let me rebase this PR to the latest commits, Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810840862


   > Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: jdk.internal.ref.Cleaner
   
   OK,  from the class name `jdk.internal.ref.Cleaner`,  I guess you are testing this in openjdk-11 because that class was introduced since jdk11  ( see [here](https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/jdk/internal/ref/Cleaner.java)) ,  I run tests under jdk8 so I did not encounter this issue. 
   
   Any way ,  I think it's necessary to address this issue,  let me consider how to get this done. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-974903880


   @dubeme, that isn't the same problem. That issue is happening when you serialize `SparkPartition`, not a data file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r597393058



##########
File path: core/src/main/java/org/apache/iceberg/BaseFile.java
##########
@@ -155,7 +153,7 @@ public PartitionData copy() {
   /**
    * Copy constructor.
    *
-   * @param toCopy a generic data file to copy.
+   * @param toCopy   a generic data file to copy.

Review comment:
       Okay, will revert this change. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r600238771



##########
File path: core/src/main/java/org/apache/iceberg/BaseFile.java
##########
@@ -309,17 +306,17 @@ public Object get(int i) {
       case 5:
         return fileSizeInBytes;
       case 6:
-        return columnSizes;
+        return toReadableMap(columnSizes);
       case 7:
-        return valueCounts;
+        return toReadableMap(valueCounts);
       case 8:
-        return nullValueCounts;
+        return toReadableMap(nullValueCounts);
       case 9:
-        return nanValueCounts;
+        return toReadableMap(nanValueCounts);
       case 10:
-        return lowerBounds;
+        return toReadableMap(lowerBounds);
       case 11:
-        return upperBounds;
+        return toReadableMap(upperBounds);

Review comment:
       OK, that make sense ! 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r604635167



##########
File path: core/src/main/java/org/apache/iceberg/BaseFile.java
##########
@@ -169,13 +167,12 @@ public PartitionData copy() {
     this.recordCount = toCopy.recordCount;
     this.fileSizeInBytes = toCopy.fileSizeInBytes;
     if (fullCopy) {
-      // TODO: support lazy conversion to/from map
-      this.columnSizes = copy(toCopy.columnSizes);
-      this.valueCounts = copy(toCopy.valueCounts);
-      this.nullValueCounts = copy(toCopy.nullValueCounts);
-      this.nanValueCounts = copy(toCopy.nanValueCounts);
-      this.lowerBounds = SerializableByteBufferMap.wrap(copy(toCopy.lowerBounds));
-      this.upperBounds = SerializableByteBufferMap.wrap(copy(toCopy.upperBounds));
+      this.columnSizes = SerializableMap.copyOf(toCopy.columnSizes);
+      this.valueCounts = SerializableMap.copyOf(toCopy.valueCounts);
+      this.nullValueCounts = SerializableMap.copyOf(toCopy.nullValueCounts);
+      this.nanValueCounts = SerializableMap.copyOf(toCopy.nanValueCounts);
+      this.lowerBounds = SerializableByteBufferMap.wrap(SerializableMap.copyOf(toCopy.lowerBounds));

Review comment:
       We do need `SerializableByteBufferMap` for now as byte buffers may not be serializable. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810735680


   @aokolnychyi   I tried to run the unit tests by switching to use `DirectByteBuffer`,  both spark/flink `TestDataFileSerialization`   tests works fine.   So did I miss some thing in you last [comment](https://github.com/apache/iceberg/pull/2343#issuecomment-810685833),  I guess kryo will use the Java serializer to serialize & deserialize the `SerializableByteBufferMap`, but I need to confirm this by checking the code. 
   
   ```diff
   diff --git a/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java b/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java
   index ad67b181..4b43718d 100644
   --- a/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java
   +++ b/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java
   @@ -196,6 +196,6 @@ public class TestDataFileSerialization {
      }
    
      private static ByteBuffer longToBuffer(long value) {
   -    return ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
   +    return ByteBuffer.allocateDirect(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
      }
    }
   diff --git a/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java b/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java
   index f5c2990b..eb22873d 100644
   --- a/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java
   +++ b/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java
   @@ -164,6 +164,6 @@ public class TestDataFileSerialization {
      }
    
      private static ByteBuffer longToBuffer(long value) {
   -    return ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
   +    return ByteBuffer.allocateDirect(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
      }
    }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r597299985



##########
File path: core/src/main/java/org/apache/iceberg/BaseFile.java
##########
@@ -155,7 +153,7 @@ public PartitionData copy() {
   /**
    * Copy constructor.
    *
-   * @param toCopy a generic data file to copy.
+   * @param toCopy   a generic data file to copy.

Review comment:
       Nit: unnecessary whitespace changes can cause commit conflicts.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810768253


   Hm, both suites fail for me after making the change.
   
   Flink:
   
   ```
   java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: jdk.internal.ref.Cleaner
   Serialization trace:
   cleaner (java.nio.DirectByteBuffer)
   lowerBounds (org.apache.iceberg.GenericDataFile)
   com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: jdk.internal.ref.Cleaner
   Serialization trace:
   cleaner (java.nio.DirectByteBuffer)
   lowerBounds (org.apache.iceberg.GenericDataFile)
   	at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:82)
   	at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
   	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:599)
   	at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:95)
   	at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:21)
   	at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:523)
   	at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:61)
   	at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:495)
   	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:599)
   	at org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.serialize(KryoSerializer.java:316)
   	at org.apache.iceberg.flink.TestDataFileSerialization.testDataFileKryoSerialization(TestDataFileSerialization.java:160)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
   	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
   	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
   	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
   	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
   	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
   	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
   	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
   	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
   	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
   	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
   	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
   	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
   	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
   	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
   	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
   	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
   	at java.base/java.lang.Thread.run(Thread.java:834)
   Caused by: java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: jdk.internal.ref.Cleaner
   ```
   
   Spark:
   
   ```
   java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: jdk.internal.ref.Cleaner
   Serialization trace:
   cleaner (java.nio.DirectByteBuffer)
   lowerBounds (org.apache.iceberg.GenericDataFile)
   com.esotericsoftware.kryo.KryoException: java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: jdk.internal.ref.Cleaner
   Serialization trace:
   cleaner (java.nio.DirectByteBuffer)
   lowerBounds (org.apache.iceberg.GenericDataFile)
   	at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:101)
   	at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
   	at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:113)
   	at com.esotericsoftware.kryo.serializers.MapSerializer.write(MapSerializer.java:39)
   	at com.esotericsoftware.kryo.Kryo.writeObject(Kryo.java:575)
   	at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:79)
   	at com.esotericsoftware.kryo.serializers.FieldSerializer.write(FieldSerializer.java:508)
   	at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:651)
   	at org.apache.iceberg.TestDataFileSerialization.testDataFileKryoSerialization(TestDataFileSerialization.java:109)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
   	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
   	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
   	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
   	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
   	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
   	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
   	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
   	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
   	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
   	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
   	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
   	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
   	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
   	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
   	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
   	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
   	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
   	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
   	at java.base/java.lang.Thread.run(Thread.java:834)
   Caused by: java.lang.IllegalArgumentException: Unable to create serializer "com.esotericsoftware.kryo.serializers.FieldSerializer" for class: jdk.internal.ref.Cleaner
   	at com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:65)
   	at com.esotericsoftware.kryo.factories.ReflectionSerializerFactory.makeSerializer(ReflectionSerializerFactory.java:43)
   	at com.esotericsoftware.kryo.Kryo.newDefaultSerializer(Kryo.java:396)
   	at com.twitter.chill.KryoBase.newDefaultSerializer(KryoBase.scala:50)
   	at com.esotericsoftware.kryo.Kryo.getDefaultSerializer(Kryo.java:380)
   	at com.esotericsoftware.kryo.util.DefaultClassResolver.registerImplicit(DefaultClassResolver.java:74)
   	at com.esotericsoftware.kryo.Kryo.getRegistration(Kryo.java:508)
   	at com.esotericsoftware.kryo.util.DefaultClassResolver.writeClass(DefaultClassResolver.java:97)
   	at com.esotericsoftware.kryo.Kryo.writeClass(Kryo.java:540)
   	at com.esotericsoftware.kryo.serializers.ObjectField.write(ObjectField.java:75)
   	... 58 more
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-811313374


   Let's collaborate, @openinx. I am looking into this as well. I feel the we have the same problem in Flink and Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r604314442



##########
File path: core/src/main/java/org/apache/iceberg/BaseFile.java
##########
@@ -169,13 +167,12 @@ public PartitionData copy() {
     this.recordCount = toCopy.recordCount;
     this.fileSizeInBytes = toCopy.fileSizeInBytes;
     if (fullCopy) {
-      // TODO: support lazy conversion to/from map
-      this.columnSizes = copy(toCopy.columnSizes);
-      this.valueCounts = copy(toCopy.valueCounts);
-      this.nullValueCounts = copy(toCopy.nullValueCounts);
-      this.nanValueCounts = copy(toCopy.nanValueCounts);
-      this.lowerBounds = SerializableByteBufferMap.wrap(copy(toCopy.lowerBounds));
-      this.upperBounds = SerializableByteBufferMap.wrap(copy(toCopy.upperBounds));
+      this.columnSizes = SerializableMap.copyOf(toCopy.columnSizes);
+      this.valueCounts = SerializableMap.copyOf(toCopy.valueCounts);
+      this.nullValueCounts = SerializableMap.copyOf(toCopy.nullValueCounts);
+      this.nanValueCounts = SerializableMap.copyOf(toCopy.nanValueCounts);
+      this.lowerBounds = SerializableByteBufferMap.wrap(SerializableMap.copyOf(toCopy.lowerBounds));

Review comment:
       Later, we should check whether we need two wrappers, but it isn't a blocker here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-811314089


   I don't have a good solution so far. This is what I considered:
   
   - Require a custom serializer to be registered for non-serializable byte buffer implementations.
   - Modify `SerializableByteBufferMap` to store byte arrays, which would mean eager conversion of byte buffers to arrays.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r597901399



##########
File path: core/src/main/java/org/apache/iceberg/BaseFile.java
##########
@@ -309,17 +306,17 @@ public Object get(int i) {
       case 5:
         return fileSizeInBytes;
       case 6:
-        return columnSizes;
+        return toReadableMap(columnSizes);
       case 7:
-        return valueCounts;
+        return toReadableMap(valueCounts);
       case 8:
-        return nullValueCounts;
+        return toReadableMap(nullValueCounts);
       case 9:
-        return nanValueCounts;
+        return toReadableMap(nanValueCounts);
       case 10:
-        return lowerBounds;
+        return toReadableMap(lowerBounds);
       case 11:
-        return upperBounds;
+        return toReadableMap(upperBounds);

Review comment:
       I don't think we need to use `toReadableMap` in this method because it is only used for serialization. We don't expose the Avro methods publicly, so we only need the unmodifiable maps to be returned by the `ContentFile` accessor methods.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r597901734



##########
File path: flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java
##########
@@ -0,0 +1,201 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.flink;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.io.ObjectInputStream;
+import java.io.ObjectOutputStream;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.util.Map;
+import org.apache.flink.api.common.ExecutionConfig;
+import org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer;
+import org.apache.flink.core.memory.DataInputDeserializer;
+import org.apache.flink.core.memory.DataOutputSerializer;
+import org.apache.iceberg.DataFile;
+import org.apache.iceberg.DataFiles;
+import org.apache.iceberg.DeleteFile;
+import org.apache.iceberg.FileMetadata;
+import org.apache.iceberg.Metrics;
+import org.apache.iceberg.PartitionSpec;
+import org.apache.iceberg.Schema;
+import org.apache.iceberg.SortOrder;
+import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+import org.apache.iceberg.types.Types;
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.iceberg.types.Types.NestedField.optional;
+import static org.apache.iceberg.types.Types.NestedField.required;
+
+public class TestDataFileSerialization {

Review comment:
       Tests look good.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r600248417



##########
File path: core/src/main/java/org/apache/iceberg/LazyImmutableMap.java
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class LazyImmutableMap<K, V> implements Map<K, V>, Serializable {
+
+  private final Map<K, V> copiedMap;
+  private Map<K, V> immutableMap;
+
+  LazyImmutableMap() {
+    this.copiedMap = Maps.newLinkedHashMap();
+  }
+
+  private LazyImmutableMap(Map<K, V> map) {
+    this.copiedMap = Maps.newLinkedHashMap();

Review comment:
       OK, I was thinking that the map assertion will check the pairs insert order, so I used the `LinkedHashMap` here.   Read the code again, here we can use the `HashMap` directly.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi edited a comment on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi edited a comment on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810685833


   I am afraid it does not solve our issues with Kryo. We still are going to fail if we get a non-serializable byte buffer.
   
   I think `SerializableByteBufferMap` fixes the problem for Java serialization but we don't really have a solution for Kryo. If we switch to `DirectByteBuffer` from `HeapByteBuffer`, the Kryo tests in this PR will fail.
   
   We probably have two options:
   - Require a custom serializer to be registered for non-serializable byte buffer implementations.
   - Modify `SerializableByteBufferMap` to store byte arrays, which would mean eager conversion of byte buffers to arrays.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810698456


   Also, to answer why Spark is capable to handle unmodifiable collections: Spark registers by default com.twitter.chill serializers  that cover unmodifiable maps/lists. That's why our Spark tests work. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx commented on a change in pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx commented on a change in pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#discussion_r600248617



##########
File path: core/src/main/java/org/apache/iceberg/LazyImmutableMap.java
##########
@@ -0,0 +1,124 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg;
+
+import java.io.Serializable;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Map;
+import java.util.Set;
+import org.apache.iceberg.relocated.com.google.common.collect.Maps;
+
+public class LazyImmutableMap<K, V> implements Map<K, V>, Serializable {

Review comment:
       Sound good to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] openinx edited a comment on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
openinx edited a comment on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-810735680


   @aokolnychyi   I tried to run the unit tests by switching to use `DirectByteBuffer`,  both spark/flink `TestDataFileSerialization`   tests works fine (See following patch).   So did I miss some thing in you last [comment](https://github.com/apache/iceberg/pull/2343#issuecomment-810685833) ? I guess kryo will use the Java serializer to serialize & deserialize the `SerializableByteBufferMap`, but I need to confirm this by checking the code. 
   
   ```diff
   diff --git a/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java b/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java
   index ad67b181..4b43718d 100644
   --- a/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java
   +++ b/flink/src/test/java/org/apache/iceberg/flink/TestDataFileSerialization.java
   @@ -196,6 +196,6 @@ public class TestDataFileSerialization {
      }
    
      private static ByteBuffer longToBuffer(long value) {
   -    return ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
   +    return ByteBuffer.allocateDirect(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
      }
    }
   diff --git a/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java b/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java
   index f5c2990b..eb22873d 100644
   --- a/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java
   +++ b/spark/src/test/java/org/apache/iceberg/TestDataFileSerialization.java
   @@ -164,6 +164,6 @@ public class TestDataFileSerialization {
      }
    
      private static ByteBuffer longToBuffer(long value) {
   -    return ByteBuffer.allocate(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
   +    return ByteBuffer.allocateDirect(8).order(ByteOrder.LITTLE_ENDIAN).putLong(0, value);
      }
    }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dubeme edited a comment on pull request #2343: Core: Fix the kryo serialization issue in BaseFile.

Posted by GitBox <gi...@apache.org>.
dubeme edited a comment on pull request #2343:
URL: https://github.com/apache/iceberg/pull/2343#issuecomment-974583421


   It seems this serialization bug persists when calling `add_files`... I created this issue showcasing my scenario #3586


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org