You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/01 02:21:40 UTC

[GitHub] [iceberg] kbendick opened a new pull request #1539: Add names to parameterized tests and simplify the parameters list

kbendick opened a new pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539


   For many of the tests that are using Parameterized from junit, our description winds up reading <methodname><index of parameter>.
   
   Here's an example
   
   ```md
   testOverwriteAnotherRangeWithinPartitionValidated[0]
   ```
   
   However, we'd like to have a format string to show what the parameter is and what its current value is in a given test run. This can be accomplished by using `name` in the `Parameterized.Parameters` section where test parameters to be looped over are placed.
   
   I also am taking the liberty to simplify some of the parameter definitions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498389613



##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java
##########
@@ -105,12 +105,10 @@
     }
   }
 
-  @Parameterized.Parameters
-  public static Object[][] parameters() {
-    return new Object[][] {
-        new Object[] { false },
-        // new Object[] { true }
-    };
+  @Parameterized.Parameters(name =  "vectorized = {0}")
+  // Note - Does not currently run with the `true` parameter. Why is that?

Review comment:
       Yes, we should point to the issue. Thanks for updating it.
   
   To clarify, this is for vectorized data reads with the _pos metadata column to get row position, not for reading metadata files. I think that this is being implemented in #1356.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498000518



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java
##########
@@ -408,6 +408,11 @@ public String name() {
         public TestInputFormat<T> create(Configuration conf) {
           return function.apply(conf);
         }
+
+        @Override
+        public String toString() {
+          return String.format("Test%s<T>", name());

Review comment:
       Ok. I've decided to leave it in as stands and I've created an issue to see if I can follow up on it so that we have the different types used in the parameter groups properly displayed in the tests: https://github.com/apache/iceberg/issues/1542




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497980888



##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java
##########
@@ -105,12 +105,10 @@
     }
   }
 
-  @Parameterized.Parameters
-  public static Object[][] parameters() {
-    return new Object[][] {
-        new Object[] { false },
-        // new Object[] { true }
-    };
+  @Parameterized.Parameters(name =  "vectorized = {0}")
+  // Note - Does not currently run with the `true` parameter. Why is that?

Review comment:
       Nope. None of the three tests pass with `true`.
   
   Here's a stack trace from `testReadRowNumbersWithFilter`:
   
   ```java
   Actual should be an InternalRow: schema
   java.lang.AssertionError: Actual should be an InternalRow: schema
   	at org.junit.Assert.fail(Assert.java:88)
   	at org.junit.Assert.assertTrue(Assert.java:41)
   	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:611)
   	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:600)
   	at org.apache.iceberg.spark.data.TestSparkParquetReadMetadataColumns.readAndValidate(TestSparkParquetReadMetadataColumns.java:212)
   	at org.apache.iceberg.spark.data.TestSparkParquetReadMetadataColumns.testReadRowNumbersWithFilter(TestSparkParquetReadMetadataColumns.java:165)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   ...
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497985570



##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java
##########
@@ -105,12 +105,10 @@
     }
   }
 
-  @Parameterized.Parameters
-  public static Object[][] parameters() {
-    return new Object[][] {
-        new Object[] { false },
-        // new Object[] { true }
-    };
+  @Parameterized.Parameters(name =  "vectorized = {0}")
+  // Note - Does not currently run with the `true` parameter. Why is that?

Review comment:
       I opened an issue to add support: https://github.com/apache/iceberg/issues/1540
   
   Possibly we can remove the discussion of `true` and just add a link to the issue? Is this even something we care about implementing (I believe likely yes if we allow metadata files to reach the GB size). Especially given that we had a discussion today in the meeting and later on slack about distributed query planning for metadata and file pruning started by @aokolnychyi. However, I would defer to your judgement about whether or not to keep the issue open.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick edited a comment on pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick edited a comment on pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#issuecomment-701788175


   In some of the tests, one of the parameters is a String[] instead of a String. Because the current JUnit set up casts the names based on index by calling `.toString()` on them, we wind up with some bad names in these edge cases. Example:
   
   ```md
   testReplacePartitions[catalogName=testhadoop, baseNamespace=[Ljava.lang.String;36c1c7d9], format=PARQUET, isStreaming=false]
   ```
   
   We can fix this in a few ways. One, there is another library that offers us the ability to have more control over parameters: https://github.com/Pragmatists/JUnitParams
   
   However, I'm not so sure about that. Additionally, while baseNamespace is not _exactly_ a namespace, it is in a way. And there is a class `Namespace` with a meaningful `toString` method. We can either create an additional `BaseNamespace` class or simply delegate to `Namespace` and accept that a `BaseNamespace` and a `Namespace` are essentially just references to the different possible lengths of a namespace.
   
   What do others think @rdblue? `baseNamespace` as shown above is the only example I have found. And `Namespace` already has a helper function in the Flink test code:
   
   ```java
     private Namespace toNamespace(String database) {
       String[] namespace = new String[baseNamespace.length + 1];
       System.arraycopy(baseNamespace, 0, namespace, 0, baseNamespace.length);
       namespace[baseNamespace.length] = database;
       return Namespace.of(namespace);
     }
   ```  
   
   I can also create a separate issue for the ones that generate wonky names and we can deal with them / debate the meaning of `Namespace` in another PR since this one is already huge (and intended to be a simple swap one for one).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#issuecomment-702884661


   Thanks, @kbendick! This is really helpful.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498482759



##########
File path: data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilterTypes.java
##########
@@ -222,52 +222,48 @@ public void createParquetInputFile(List<Record> records) throws IOException {
   private final Object readValue;
   private final Object skipValue;
 
-  @Parameterized.Parameters
+  @Parameterized.Parameters(name = "format = {0} column = {1} readValue = {2} skipValue = {3}")
   public static Object[][] parameters() {
     return new Object[][] {
-        new Object[] { "parquet", "boolean", false, true },
-        new Object[] { "parquet", "int", 5, 55 },
-        new Object[] { "parquet", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "parquet", "float", 1.97f, 2.11f },
-        new Object[] { "parquet", "double", 2.11d, 1.97d },
-        new Object[] { "parquet", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
-        new Object[] { "parquet", "timestamp",
-            "2018-06-29T10:02:34.000000",
-            "2018-06-29T15:02:34.000000" },
-        new Object[] { "parquet", "timestamptz",
-            "2018-06-29T10:02:34.000000+00:00",
-            "2018-06-29T10:02:34.000000-07:00" },
-        new Object[] { "parquet", "string", "tapir", "monthly" },
-        // new Object[] { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
-        new Object[] { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
-        new Object[] { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
-        new Object[] { "parquet", "int_decimal", "77.77", "12.34" },
-        new Object[] { "parquet", "long_decimal", "88.88", "12.34" },
-        new Object[] { "parquet", "fixed_decimal", "99.99", "12.34" },
+        { "parquet", "boolean", false, true },
+        { "parquet", "int", 5, 55 },
+        { "parquet", "long", 5_000_000_049L, 5_000L },
+        { "parquet", "float", 1.97f, 2.11f },
+        { "parquet", "double", 2.11d, 1.97d },
+        { "parquet", "date", "2018-06-29", "2018-05-03" },
+        { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "parquet", "timestamp", "2018-06-29T10:02:34.000000", "2018-06-29T15:02:34.000000" },
+        { "parquet", "timestamptz", "2018-06-29T10:02:34.000000+00:00", "2018-06-29T10:02:34.000000-07:00" },
+        { "parquet", "string", "tapir", "monthly" },
+        // { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
+        { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
+        { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
+        { "parquet", "int_decimal", "77.77", "12.34" },
+        { "parquet", "long_decimal", "88.88", "12.34" },
+        { "parquet", "fixed_decimal", "99.99", "12.34" },
 
-        new Object[] { "orc", "boolean", false, true },
-        new Object[] { "orc", "int", 5, 55 },
-        new Object[] { "orc", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "orc", "float", 1.97f, 2.11f },
-        new Object[] { "orc", "double", 2.11d, 1.97d },
-        new Object[] { "orc", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "orc", "boolean", false, true },
+        { "orc", "int", 5, 55 },
+        { "orc", "long", 5_000_000_049L, 5_000L },
+        { "orc", "float", 1.97f, 2.11f },
+        { "orc", "double", 2.11d, 1.97d },
+        { "orc", "date", "2018-06-29", "2018-05-03" },
+        { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
         // Temporarily disable filters on Timestamp columns due to ORC-611
-        // new Object[] { "orc", "timestamp",
+        // { "orc", "timestamp",

Review comment:
       Turns out 1.7.0 is not published to maven. I am keeping these tests commented out with a mention of the corresponding still open HIVE ticket, as it indicates where in the OrcInputFormat for Hive is the issue. Might be of use to us.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497977641



##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java
##########
@@ -105,12 +105,10 @@
     }
   }
 
-  @Parameterized.Parameters
-  public static Object[][] parameters() {
-    return new Object[][] {
-        new Object[] { false },
-        // new Object[] { true }
-    };
+  @Parameterized.Parameters(name =  "vectorized = {0}")
+  // Note - Does not currently run with the `true` parameter. Why is that?

Review comment:
       I don't think that metadata columns have been implemented for vectorized Parquet yet? Does it pass if you uncomment it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497983639



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java
##########
@@ -408,6 +408,11 @@ public String name() {
         public TestInputFormat<T> create(Configuration conf) {
           return function.apply(conf);
         }
+
+        @Override
+        public String toString() {
+          return String.format("Test%s<T>", name());

Review comment:
       @rdblue should I remove this one and instead grab it in a follow up PR? Its just part of the test suites, but the template parameter T could be grabbed via java reflection - and it is different in some of the tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498376554



##########
File path: data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilterTypes.java
##########
@@ -222,52 +222,48 @@ public void createParquetInputFile(List<Record> records) throws IOException {
   private final Object readValue;
   private final Object skipValue;
 
-  @Parameterized.Parameters
+  @Parameterized.Parameters(name = "format = {0} column = {1} readValue = {2} skipValue = {3}")
   public static Object[][] parameters() {
     return new Object[][] {
-        new Object[] { "parquet", "boolean", false, true },
-        new Object[] { "parquet", "int", 5, 55 },
-        new Object[] { "parquet", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "parquet", "float", 1.97f, 2.11f },
-        new Object[] { "parquet", "double", 2.11d, 1.97d },
-        new Object[] { "parquet", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
-        new Object[] { "parquet", "timestamp",
-            "2018-06-29T10:02:34.000000",
-            "2018-06-29T15:02:34.000000" },
-        new Object[] { "parquet", "timestamptz",
-            "2018-06-29T10:02:34.000000+00:00",
-            "2018-06-29T10:02:34.000000-07:00" },
-        new Object[] { "parquet", "string", "tapir", "monthly" },
-        // new Object[] { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
-        new Object[] { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
-        new Object[] { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
-        new Object[] { "parquet", "int_decimal", "77.77", "12.34" },
-        new Object[] { "parquet", "long_decimal", "88.88", "12.34" },
-        new Object[] { "parquet", "fixed_decimal", "99.99", "12.34" },
+        { "parquet", "boolean", false, true },
+        { "parquet", "int", 5, 55 },
+        { "parquet", "long", 5_000_000_049L, 5_000L },
+        { "parquet", "float", 1.97f, 2.11f },
+        { "parquet", "double", 2.11d, 1.97d },
+        { "parquet", "date", "2018-06-29", "2018-05-03" },
+        { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "parquet", "timestamp", "2018-06-29T10:02:34.000000", "2018-06-29T15:02:34.000000" },
+        { "parquet", "timestamptz", "2018-06-29T10:02:34.000000+00:00", "2018-06-29T10:02:34.000000-07:00" },
+        { "parquet", "string", "tapir", "monthly" },
+        // { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
+        { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
+        { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
+        { "parquet", "int_decimal", "77.77", "12.34" },
+        { "parquet", "long_decimal", "88.88", "12.34" },
+        { "parquet", "fixed_decimal", "99.99", "12.34" },
 
-        new Object[] { "orc", "boolean", false, true },
-        new Object[] { "orc", "int", 5, 55 },
-        new Object[] { "orc", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "orc", "float", 1.97f, 2.11f },
-        new Object[] { "orc", "double", 2.11d, 1.97d },
-        new Object[] { "orc", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "orc", "boolean", false, true },
+        { "orc", "int", 5, 55 },
+        { "orc", "long", 5_000_000_049L, 5_000L },
+        { "orc", "float", 1.97f, 2.11f },
+        { "orc", "double", 2.11d, 1.97d },
+        { "orc", "date", "2018-06-29", "2018-05-03" },
+        { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
         // Temporarily disable filters on Timestamp columns due to ORC-611
-        // new Object[] { "orc", "timestamp",
+        // { "orc", "timestamp",

Review comment:
       I'm going to also try 1.7.0, but I'm wondering if we aren't fully excluding some ORC dependencies from other dependencies.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#issuecomment-701877425


   Mostly looks good! Just a couple questions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497980888



##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java
##########
@@ -105,12 +105,10 @@
     }
   }
 
-  @Parameterized.Parameters
-  public static Object[][] parameters() {
-    return new Object[][] {
-        new Object[] { false },
-        // new Object[] { true }
-    };
+  @Parameterized.Parameters(name =  "vectorized = {0}")
+  // Note - Does not currently run with the `true` parameter. Why is that?

Review comment:
       Nope. None of the three tests pass with `true`.
   
   Here's a stack trace from `testReadRowNumbersWithFilter`:
   
   ```java
   Actual should be an InternalRow: schema
   java.lang.AssertionError: Actual should be an InternalRow: schema
   	at org.junit.Assert.fail(Assert.java:88)
   	at org.junit.Assert.assertTrue(Assert.java:41)
   	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:611)
   	at org.apache.iceberg.spark.data.TestHelpers.assertEquals(TestHelpers.java:600)
   	at org.apache.iceberg.spark.data.TestSparkParquetReadMetadataColumns.readAndValidate(TestSparkParquetReadMetadataColumns.java:212)
   	at org.apache.iceberg.spark.data.TestSparkParquetReadMetadataColumns.testReadRowNumbersWithFilter(TestSparkParquetReadMetadataColumns.java:165)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
   	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
   	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
   	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   	at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
   	at org.junit.rules.RunRules.evaluate(RunRules.java:20)
   	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
   	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
   	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
   	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
   	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
   	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
   	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
   	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
   	at org.junit.runners.Suite.runChild(Suite.java:128)
   	at org.junit.runners.Suite.runChild(Suite.java:27)
   	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
   	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
   	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
   	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
   	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
   	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
   	at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
   	at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
   	at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
   	at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
   	at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
   	at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
   	at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
   	at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:175)
   	at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:157)
   	at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:404)
   	at org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:63)
   	at org.gradle.internal.concurrent.ManagedExecutorImpl$1.run(ManagedExecutorImpl.java:46)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at org.gradle.internal.concurrent.ThreadFactoryImpl$ManagedThreadRunnable.run(ThreadFactoryImpl.java:55)
   	at java.lang.Thread.run(Thread.java:748)
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#issuecomment-701788175


   In some of the tests, one of the parameters is a String[] instead of a String. Because the current JUnit set up casts the names based on index by calling `.toString()` on them, we wind up with some bad names in these edge cases. Example:
   
   ```md
   testReplacePartitions[catalogName=testhadoop, baseNamespace=[Ljava.lang.String;36c1c7d9], format=PARQUET, isStreaming=false]
   ```
   
   We can fix this in a few ways. One, there is another library that offers us the ability to have more control over parameters: https://github.com/Pragmatists/JUnitParams
   
   However, I'm not so sure about that. Additionally, while baseNamespace is not _exactly_ a namespace, it is in a way. And there is a class `Namespace` with a meaningful `toString` method. We can either create an additional `BaseNamespace` class or simply delegate to `Namespace` and accept that a `BaseNamespace` and a `Namespace` are essentially just references to the different possible lengths of a namespace.
   
   What do others think @rdblue? `baseNamespace` as shown above is the only example I have found. And `Namespace` already has a helper function in the Flink test code:
   
   ```java
     private Namespace toNamespace(String database) {
       String[] namespace = new String[baseNamespace.length + 1];
       System.arraycopy(baseNamespace, 0, namespace, 0, baseNamespace.length);
       namespace[baseNamespace.length] = database;
       return Namespace.of(namespace);
     }
   ```  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498365494



##########
File path: data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilterTypes.java
##########
@@ -222,52 +222,48 @@ public void createParquetInputFile(List<Record> records) throws IOException {
   private final Object readValue;
   private final Object skipValue;
 
-  @Parameterized.Parameters
+  @Parameterized.Parameters(name = "format = {0} column = {1} readValue = {2} skipValue = {3}")
   public static Object[][] parameters() {
     return new Object[][] {
-        new Object[] { "parquet", "boolean", false, true },
-        new Object[] { "parquet", "int", 5, 55 },
-        new Object[] { "parquet", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "parquet", "float", 1.97f, 2.11f },
-        new Object[] { "parquet", "double", 2.11d, 1.97d },
-        new Object[] { "parquet", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
-        new Object[] { "parquet", "timestamp",
-            "2018-06-29T10:02:34.000000",
-            "2018-06-29T15:02:34.000000" },
-        new Object[] { "parquet", "timestamptz",
-            "2018-06-29T10:02:34.000000+00:00",
-            "2018-06-29T10:02:34.000000-07:00" },
-        new Object[] { "parquet", "string", "tapir", "monthly" },
-        // new Object[] { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
-        new Object[] { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
-        new Object[] { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
-        new Object[] { "parquet", "int_decimal", "77.77", "12.34" },
-        new Object[] { "parquet", "long_decimal", "88.88", "12.34" },
-        new Object[] { "parquet", "fixed_decimal", "99.99", "12.34" },
+        { "parquet", "boolean", false, true },
+        { "parquet", "int", 5, 55 },
+        { "parquet", "long", 5_000_000_049L, 5_000L },
+        { "parquet", "float", 1.97f, 2.11f },
+        { "parquet", "double", 2.11d, 1.97d },
+        { "parquet", "date", "2018-06-29", "2018-05-03" },
+        { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "parquet", "timestamp", "2018-06-29T10:02:34.000000", "2018-06-29T15:02:34.000000" },
+        { "parquet", "timestamptz", "2018-06-29T10:02:34.000000+00:00", "2018-06-29T10:02:34.000000-07:00" },
+        { "parquet", "string", "tapir", "monthly" },
+        // { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
+        { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
+        { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
+        { "parquet", "int_decimal", "77.77", "12.34" },
+        { "parquet", "long_decimal", "88.88", "12.34" },
+        { "parquet", "fixed_decimal", "99.99", "12.34" },
 
-        new Object[] { "orc", "boolean", false, true },
-        new Object[] { "orc", "int", 5, 55 },
-        new Object[] { "orc", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "orc", "float", 1.97f, 2.11f },
-        new Object[] { "orc", "double", 2.11d, 1.97d },
-        new Object[] { "orc", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "orc", "boolean", false, true },
+        { "orc", "int", 5, 55 },
+        { "orc", "long", 5_000_000_049L, 5_000L },
+        { "orc", "float", 1.97f, 2.11f },
+        { "orc", "double", 2.11d, 1.97d },
+        { "orc", "date", "2018-06-29", "2018-05-03" },
+        { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
         // Temporarily disable filters on Timestamp columns due to ORC-611
-        // new Object[] { "orc", "timestamp",
+        // { "orc", "timestamp",

Review comment:
       I tried updating but it did not affect the tests. There were several orc related files on the compile time path, but I'm not sure what ran during the tests.
   
   It looks like the issue ORC-611 was closed, but it still exists in https://issues.apache.org/jira/browse/HIVE-23036




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#issuecomment-701907253


   > I'd say let's reuse `Namespace`. Thanks!
   
   I agree. I've created an issue to follow up on that as it will ever so slightly require us to change the code in `src` and given the size of this PR, I wanted to keep the changes limited to things underneath `test`. I will follow up. 👍 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497982022



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java
##########
@@ -130,6 +130,24 @@ public void close() throws CatalogException {
     }
   }
 
+  // Going to save this change for another PR as this PR is already really big.

Review comment:
       I have removed this and will open an issue to track all of the places where the test parameters show up as string array references or other wonky `toString` values.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497979964



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java
##########
@@ -130,6 +130,24 @@ public void close() throws CatalogException {
     }
   }
 
+  // Going to save this change for another PR as this PR is already really big.

Review comment:
       Yeah. All of the classes that show up kind of wonky, I'm going to just treat the same in this PR (let them show up as (new String[2]{"hello", "world"}).toString like the JUnit runner wants them to), and then I'll open an issue to follow up on them in a separate PR.
   
   So this PR can be a straight addition of just names to the tests.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497994708



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java
##########
@@ -130,6 +130,24 @@ public void close() throws CatalogException {
     }
   }
 
+  // Going to save this change for another PR as this PR is already really big.

Review comment:
       I've removed those lines and created an issue to update `baseNamespace` to use the `Namespace` class for its human-readable `toString` implementation and its preexisting convenient constructors.
   
   Here's the issue: https://github.com/apache/iceberg/issues/1541. I'll pick it up sometime this week if nobody else does.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498385659



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java
##########
@@ -130,6 +130,7 @@ public void close() throws CatalogException {
     }
   }
 
+

Review comment:
       Can we remove this since we don't need to change this file?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#issuecomment-701902606


   I was wondering why the tags didn't get updated for this, but I think it's because I initially opened this as a draft. I've noticed on push in other people's PRs that they _eventually_ populate (like when we added `flink_runtime` to the labeler config and later on the `flink` tag appeared in the PR).
   
   I'll look into this issue and if the tests are somehow not triggering labels appropriately, I'll make a change. But I don't think that's the case here as I have also edited some `flink` files that are not underneath any `test` directory.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue commented on pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#issuecomment-701876419


   I'd say let's reuse `Namespace`. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498488046



##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java
##########
@@ -105,12 +105,10 @@
     }
   }
 
-  @Parameterized.Parameters
-  public static Object[][] parameters() {
-    return new Object[][] {
-        new Object[] { false },
-        // new Object[] { true }
-    };
+  @Parameterized.Parameters(name =  "vectorized = {0}")
+  // Note - Does not currently run with the `true` parameter. Why is that?

Review comment:
       Thanks for the clarification. I changed the issue title to `Add in support for vectorized parquet reads with filters on row position in metadata`. How does that sound? I also updated the comment.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497984847



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java
##########
@@ -130,6 +130,24 @@ public void close() throws CatalogException {
     }
   }
 
+  // Going to save this change for another PR as this PR is already really big.

Review comment:
       Ahhh I see your comment about reusing namespace. That makes the most sense to me as well. Given that I'll have to change the non-test code, I'll do i in a separate PR. I'll create an issue for it now and get to it sometime this week.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497998206



##########
File path: mr/src/test/java/org/apache/iceberg/mr/TestIcebergInputFormats.java
##########
@@ -408,6 +408,11 @@ public String name() {
         public TestInputFormat<T> create(Configuration conf) {
           return function.apply(conf);
         }
+
+        @Override
+        public String toString() {
+          return String.format("Test%s<T>", name());

Review comment:
       I'll leave the `toString` method and then open an issue for getting the proper name of the template parameter, given that we test with multiple types substituted for `T`. It can be obtained via java reflection and since this is limited to the test files, there's no real overhead outside of potentially test time (but this isn't called so often as to be a concern).




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497977246



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java
##########
@@ -130,6 +130,24 @@ public void close() throws CatalogException {
     }
   }
 
+  // Going to save this change for another PR as this PR is already really big.

Review comment:
       Can we remove these lines then?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498008283



##########
File path: data/src/test/java/org/apache/iceberg/data/TestMetricsRowGroupFilterTypes.java
##########
@@ -222,52 +222,48 @@ public void createParquetInputFile(List<Record> records) throws IOException {
   private final Object readValue;
   private final Object skipValue;
 
-  @Parameterized.Parameters
+  @Parameterized.Parameters(name = "format = {0} column = {1} readValue = {2} skipValue = {3}")
   public static Object[][] parameters() {
     return new Object[][] {
-        new Object[] { "parquet", "boolean", false, true },
-        new Object[] { "parquet", "int", 5, 55 },
-        new Object[] { "parquet", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "parquet", "float", 1.97f, 2.11f },
-        new Object[] { "parquet", "double", 2.11d, 1.97d },
-        new Object[] { "parquet", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
-        new Object[] { "parquet", "timestamp",
-            "2018-06-29T10:02:34.000000",
-            "2018-06-29T15:02:34.000000" },
-        new Object[] { "parquet", "timestamptz",
-            "2018-06-29T10:02:34.000000+00:00",
-            "2018-06-29T10:02:34.000000-07:00" },
-        new Object[] { "parquet", "string", "tapir", "monthly" },
-        // new Object[] { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
-        new Object[] { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
-        new Object[] { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
-        new Object[] { "parquet", "int_decimal", "77.77", "12.34" },
-        new Object[] { "parquet", "long_decimal", "88.88", "12.34" },
-        new Object[] { "parquet", "fixed_decimal", "99.99", "12.34" },
+        { "parquet", "boolean", false, true },
+        { "parquet", "int", 5, 55 },
+        { "parquet", "long", 5_000_000_049L, 5_000L },
+        { "parquet", "float", 1.97f, 2.11f },
+        { "parquet", "double", 2.11d, 1.97d },
+        { "parquet", "date", "2018-06-29", "2018-05-03" },
+        { "parquet", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "parquet", "timestamp", "2018-06-29T10:02:34.000000", "2018-06-29T15:02:34.000000" },
+        { "parquet", "timestamptz", "2018-06-29T10:02:34.000000+00:00", "2018-06-29T10:02:34.000000-07:00" },
+        { "parquet", "string", "tapir", "monthly" },
+        // { "parquet", "uuid", uuid, UUID.randomUUID() }, // not supported yet
+        { "parquet", "fixed", "abcd".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3 } },
+        { "parquet", "binary", "xyz".getBytes(StandardCharsets.UTF_8), new byte[] { 0, 1, 2, 3, 4, 5 } },
+        { "parquet", "int_decimal", "77.77", "12.34" },
+        { "parquet", "long_decimal", "88.88", "12.34" },
+        { "parquet", "fixed_decimal", "99.99", "12.34" },
 
-        new Object[] { "orc", "boolean", false, true },
-        new Object[] { "orc", "int", 5, 55 },
-        new Object[] { "orc", "long", 5_000_000_049L, 5_000L },
-        new Object[] { "orc", "float", 1.97f, 2.11f },
-        new Object[] { "orc", "double", 2.11d, 1.97d },
-        new Object[] { "orc", "date", "2018-06-29", "2018-05-03" },
-        new Object[] { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
+        { "orc", "boolean", false, true },
+        { "orc", "int", 5, 55 },
+        { "orc", "long", 5_000_000_049L, 5_000L },
+        { "orc", "float", 1.97f, 2.11f },
+        { "orc", "double", 2.11d, 1.97d },
+        { "orc", "date", "2018-06-29", "2018-05-03" },
+        { "orc", "time", "10:02:34.000000", "10:02:34.000001" },
         // Temporarily disable filters on Timestamp columns due to ORC-611
-        // new Object[] { "orc", "timestamp",
+        // { "orc", "timestamp",

Review comment:
       I checked on this, and ORC-611 is closed and the fix versions are 1.6.4 and 1.7.0.
   
   I'm going to see if I can get these tests to pass or open a ticket about possibly upgrading our ORC version if not.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] rdblue merged pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

rdblue merged pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r498483212



##########
File path: flink/src/main/java/org/apache/iceberg/flink/FlinkCatalog.java
##########
@@ -130,6 +130,7 @@ public void close() throws CatalogException {
     }
   }
 
+

Review comment:
       Removed!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org

[GitHub] [iceberg] kbendick commented on a change in pull request #1539: Add names to parameterized tests and simplify the parameters list

Posted by GitBox <gi...@apache.org>.

kbendick commented on a change in pull request #1539:
URL: https://github.com/apache/iceberg/pull/1539#discussion_r497983288



##########
File path: spark/src/test/java/org/apache/iceberg/spark/data/TestSparkParquetReadMetadataColumns.java
##########
@@ -105,12 +105,10 @@
     }
   }
 
-  @Parameterized.Parameters
-  public static Object[][] parameters() {
-    return new Object[][] {
-        new Object[] { false },
-        // new Object[] { true }
-    };
+  @Parameterized.Parameters(name =  "vectorized = {0}")
+  // Note - Does not currently run with the `true` parameter. Why is that?

Review comment:
       I can open an issue for this if you'd like. I know there's been a lot of discussion around the time and cost of reading tables with very large numbers of metadata files, and possibly this would help?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org