You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/08/02 03:03:07 UTC

[GitHub] [iceberg] HeartSaVioR opened a new pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

HeartSaVioR opened a new pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285


   This patch is spin-off version of #1280 which excludes refactoring, to be included in bugfix version release.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue edited a comment on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668905254


   The test failures are due to checkstyle:
   ```
   > Task :iceberg-spark:checkstyleTest FAILED
   [ant:checkstyle] [ERROR] /home/travis/build/apache/iceberg/spark/src/test/java/org/apache/iceberg/TestScanTaskSerialization.java:1: Line does not match expected header line of '^/\*$'. [RegexpHeader]
   [ant:checkstyle] [ERROR] /home/travis/build/apache/iceberg/spark/src/test/java/org/apache/iceberg/TestScanTaskSerialization.java:23: Wrong order for 'java.io.ByteArrayInputStream' import. [ImportOrder]
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r464757208



##########
File path: core/src/main/java/org/apache/iceberg/BaseCombinedScanTask.java
##########
@@ -19,21 +19,22 @@
 
 package org.apache.iceberg;
 
+import java.util.Arrays;
 import java.util.Collection;
 import java.util.List;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
 import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
-import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 public class BaseCombinedScanTask implements CombinedScanTask {
   private final List<FileScanTask> tasks;
 
   public BaseCombinedScanTask(FileScanTask... tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = Arrays.asList(tasks);
   }
 
   public BaseCombinedScanTask(List<FileScanTask> tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = copyList(tasks);

Review comment:
       Ah OK that also makes sense - I realized we'd probably want to make sure `tasks` cannot be modified but `files()` opens it for modification outside of this class.
   
   I simply thought about defining `tasks` as `ArrayList<FileScanTask>` which also ensures no problem with serialization, but the efforts would be similar (had to make change on constructors and `files()`) so either is fine for me. I'll go through changing it to array first. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668963293


   OK finally Travis CI passed. Sorry for the noises - please take a look again. Thanks in advance!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r464909482



##########
File path: core/src/main/java/org/apache/iceberg/BaseCombinedScanTask.java
##########
@@ -26,19 +26,19 @@
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
 
 public class BaseCombinedScanTask implements CombinedScanTask {
-  private final List<FileScanTask> tasks;
+  private final FileScanTask[] tasks;
 
   public BaseCombinedScanTask(FileScanTask... tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = tasks;
   }
 
   public BaseCombinedScanTask(List<FileScanTask> tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = tasks.stream().toArray(FileScanTask[]::new);

Review comment:
       I guess the new code will also throw NPE in any way, but with different error message which is probably less helpful. Checking the null with Precondition sounds good to me, and I prefer defensive programming unless it's on critical path. :) Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-667618141


   cc. @rdblue 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r465201829



##########
File path: spark/src/test/java/org/apache/iceberg/actions/TestRewriteDataFilesAction.java
##########
@@ -311,6 +325,128 @@ public void testRewriteLargeTableHasResiduals() {
     Assert.assertEquals("Rows must match", records, actualRecords);
   }
 
+  @Test
+  public void testBaseCombinedScanTaskKryoSerialization() throws Exception {

Review comment:
       Why add these tests to `TestRewriteDataFilesAction`? These aren't specific to that action and the code change is in core. I think it would be more better to add these tests to `TestDataFileSerialization`, since that is quite similar. That, or create a new test suite, `TestScanTaskSerialization`. Either one is fine with me.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r465202383



##########
File path: core/src/main/java/org/apache/iceberg/BaseCombinedScanTask.java
##########
@@ -23,22 +23,24 @@
 import java.util.List;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
 import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
+import org.apache.iceberg.relocated.com.google.common.base.Preconditions;
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
 
 public class BaseCombinedScanTask implements CombinedScanTask {
-  private final List<FileScanTask> tasks;
+  private final FileScanTask[] tasks;
 
   public BaseCombinedScanTask(FileScanTask... tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = tasks;
   }
 
   public BaseCombinedScanTask(List<FileScanTask> tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    Preconditions.checkNotNull(tasks, "tasks cannot be null.");

Review comment:
       Nit: We try to keep error messages small, so we avoid unnecessary punctuation and words. We can remove the `.` here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-669616769


   My pleasure. Thanks for reviewing and merging!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r464550504



##########
File path: core/src/main/java/org/apache/iceberg/BaseCombinedScanTask.java
##########
@@ -19,21 +19,22 @@
 
 package org.apache.iceberg;
 
+import java.util.Arrays;
 import java.util.Collection;
 import java.util.List;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
 import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
-import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 public class BaseCombinedScanTask implements CombinedScanTask {
   private final List<FileScanTask> tasks;
 
   public BaseCombinedScanTask(FileScanTask... tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = Arrays.asList(tasks);
   }
 
   public BaseCombinedScanTask(List<FileScanTask> tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = copyList(tasks);

Review comment:
       What about a slightly different approach?
   
   Right now, the main problem is that `List` can't be easily serialized. But we know that `tasks` are `FileScanTask` and never changes, so we could use `private final FileScanTask[] tasks` instead, which had no problems with serialization.
   
   Then this would just need to copy into the array, `tasks.stream().toArray(FileScanTask[]::new)`, and we could update `files` to return `ImmutableList.copyOf(tasks)`.
   
   By making those changes, we still end up with an immutable list returned by `files`, but take care of the serialization issue and don't need to add `copyList`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668905254


   The test failures are due to checkstyle:


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668363696


   Given we get rid of necessary on refactoring other places (except tests), would we like to consolidate the refactor part on test side as well? Or still want to keep refactoring on test part be separated PR?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-669300591


   Merged. Thanks for fixing this, @HeartSaVioR!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r465408377



##########
File path: spark/src/test/java/org/apache/iceberg/TestScanTaskSerialization.java
##########
@@ -0,0 +1,191 @@
+package org.apache.iceberg;

Review comment:
       This file needs the Apache license header, which you can copy from another file.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668834457


   @HeartSaVioR, that sounds reasonable to me. Let's keep them in a new suite in Spark then.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR edited a comment on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668913654


   Thanks for checking the build result. My bad. I thought I have checkstyle plugin running in IDEA but it isn't. Fixed it, and confirmed with `gradle check -x test`. 
   
   Btw I keep failing tests on hive/flink module in my local so I had to run `gradle build` and see spark module succeeded to pass tests - checkstyle seems to come later hence couldn't catch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668831407


   While I can extract out the tests to the new test suite (will update soon), it doesn't seem to be possible to move them to core module, as we need to couple the test with Kyro (especially Spark KyroSerializer), right? That's what I understand of why other serialization test suites are also placed in spark module. Please let me know if I'm missing something.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] kbendick commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
kbendick commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r464851326



##########
File path: core/src/main/java/org/apache/iceberg/BaseCombinedScanTask.java
##########
@@ -26,19 +26,19 @@
 import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
 
 public class BaseCombinedScanTask implements CombinedScanTask {
-  private final List<FileScanTask> tasks;
+  private final FileScanTask[] tasks;
 
   public BaseCombinedScanTask(FileScanTask... tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = tasks;
   }
 
   public BaseCombinedScanTask(List<FileScanTask> tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = tasks.stream().toArray(FileScanTask[]::new);

Review comment:
       In Guava, `ImmutableList.copyOf()` will throw on null input. That same behavior isn't mentioned here. Should we document it, either via a `Preconditions` check or via an annotation such as `@ParametersAreNonNullByDefault` which Guava uses.
   
   Note: I can't speak for all versions of Guava obviously (I doubt anybody could at this point), but I found this isue mentioning it https://github.com/google/guava/issues/2470




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668932391


   Test failures seem to be related - probably I miss something on new test suite, though I think I copied all necessary stuffs from TestRewriteDataFilesAction. Will recheck and ensure the Travis CI pass.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zhangdove commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
zhangdove commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-667939169


   Thank you for your PR, I met the same problem today, and the current PR solved my problem. @HeartSaVioR 👍


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668718438


   Thanks @HeartSaVioR! The implementation looks good, but I noticed that the tests are in the wrong place. We just need to move those to core and then I'll merge this.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r465334038



##########
File path: spark/src/test/java/org/apache/iceberg/actions/TestRewriteDataFilesAction.java
##########
@@ -311,6 +325,128 @@ public void testRewriteLargeTableHasResiduals() {
     Assert.assertEquals("Rows must match", records, actualRecords);
   }
 
+  @Test
+  public void testBaseCombinedScanTaskKryoSerialization() throws Exception {

Review comment:
       I think I missed to migrate comments from the origin PR, my bad.
   
   > I decided to add the serde tests of BaseCombinedScanTask here, because it seems to require non-trivial efforts to create BasedCombinedScanTask by hand. I can move out if we'd like to have separate suite with manually crafted test objects.
   
   So that was the plan to reduce lots of manual lines by hand, but now I realized the new test cases would just duplicate small code in TestRewriteDataFilesAction even it becomes a new test suite. I'll add a new test suite. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668913654


   My bad. I thought I have checkstyle plugin running in IDEA but it isn't. Fixed it, and confirmed with `gradle check -x test`.
   
   Btw I keep failing tests on hive/flink module in my local so I had to run `gradle build` and see spark module succeeded to pass tests - checkstyle seems to come later hence couldn't catch.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] HeartSaVioR commented on pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#issuecomment-668853089


   Yeah I initially wondered why they exist in spark module, and given the serialization issue is no longer spark-only issue it'd be ideal to have such tests to core. I don't have an idea to do that though, as serializer we need to use for testing would be available in the execution (spark/flink) module.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1285: Core: fix serialization issue in BaseCombinedScanTask with Kyro (No refactor)

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1285:
URL: https://github.com/apache/iceberg/pull/1285#discussion_r464550504



##########
File path: core/src/main/java/org/apache/iceberg/BaseCombinedScanTask.java
##########
@@ -19,21 +19,22 @@
 
 package org.apache.iceberg;
 
+import java.util.Arrays;
 import java.util.Collection;
 import java.util.List;
 import org.apache.iceberg.relocated.com.google.common.base.Joiner;
 import org.apache.iceberg.relocated.com.google.common.base.MoreObjects;
-import org.apache.iceberg.relocated.com.google.common.collect.ImmutableList;
+import org.apache.iceberg.relocated.com.google.common.collect.Lists;
 
 public class BaseCombinedScanTask implements CombinedScanTask {
   private final List<FileScanTask> tasks;
 
   public BaseCombinedScanTask(FileScanTask... tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = Arrays.asList(tasks);
   }
 
   public BaseCombinedScanTask(List<FileScanTask> tasks) {
-    this.tasks = ImmutableList.copyOf(tasks);
+    this.tasks = copyList(tasks);

Review comment:
       What about a slightly different approach?
   
   Right now, the main problem is that `List` can't be easily serialized. But we know that `tasks` are `FileScanTask` and never changes, so we could use `private final FileScanTask[] tasks` instead, which has no problems with serialization.
   
   Then this would just need to copy into the array, `tasks.stream().toArray(FileScanTask[]::new)`, and we could update `files` to return `ImmutableList.copyOf(tasks)`.
   
   By making those changes, we still end up with an immutable list returned by `files`, but take care of the serialization issue and don't need to add `copyList`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org