You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/10/22 19:26:41 UTC

[GitHub] [iceberg] Fokko opened a new pull request #1648: Move to Apache Avro 1.10.0

Fokko opened a new pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648


   Moving the Apache Avro 1.10.0. But with two changes:
   
   - The `GenericRecord.get(...)` would return null before <= 1.10.0, but now it will give an exception. When setting a field that doesn't exist, it will give an exception, but this wasn't the case when retrieving a field. When you would have a typo in field name, this would silently return a null. More information in https://issues.apache.org/jira/browse/AVRO-2278
   - The validation of downcasting decimals was made more strict. Therefore with the Parquet to Avro Decimal conversion, there was an implicit downcast from 34 to 10:
   ```
   org.apache.iceberg.spark.data.parquet.vectorized.TestParquetVectorizedReads > testMostlyNullsForOptionalFields FAILED
       org.apache.avro.AvroTypeException: Cannot encode decimal with precision 14 as max precision 10
           at org.apache.avro.Conversions$DecimalConversion.validate(Conversions.java:141)
           at org.apache.avro.Conversions$DecimalConversion.toFixed(Conversions.java:105)
           at org.apache.iceberg.parquet.ParquetAvro$FixedDecimalConversion.toFixed(ParquetAvro.java:177)
           at org.apache.iceberg.parquet.ParquetAvro$FixedDecimalConversion.toFixed(ParquetAvro.java:156)
           at org.apache.parquet.avro.AvroWriteSupport.convert(AvroWriteSupport.java:293)
           at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:276)
           at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:191)
           at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:165)
           at org.apache.iceberg.parquet.ParquetWriteSupport.write(ParquetWriteSupport.java:62)
           at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:128)
           at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:301)
           at org.apache.iceberg.parquet.ParquetWriteAdapter.add(ParquetWriteAdapter.java:45)
           at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:32)
           at org.apache.iceberg.io.FileAppender.addAll(FileAppender.java:37)
           at org.apache.iceberg.spark.data.parquet.vectorized.TestParquetVectorizedReads.writeAndValidate(TestParquetVectorizedReads.java:74)
           at org.apache.iceberg.spark.data.parquet.vectorized.TestParquetVectorizedReads.testMostlyNullsForOptionalFields(TestParquetVectorizedReads.java:175)
   ```
   The current conversion only takes scale into account, so it could be that precision would be lost.
   
   Personally, I like the more explicit messaging when it comes to non-existent fields and loss of precision. But this is a matter of taste.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-805239197


   Yes, I'll fix it right away.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko edited a comment on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko edited a comment on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759304104


   Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2. Avro 1.10.1 uses Jackson 2.11, and 1.9 is still on 2.10.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a change in pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r553907182



##########
File path: api/src/test/java/org/apache/iceberg/AssertHelpers.java
##########
@@ -28,7 +30,8 @@ private AssertHelpers() {
   }
 
   /**
-   * A convenience method to avoid a large number of @Test(expected=...) tests
+   * A convenience method to avoid a large numb/test/java/org/apache/iceberg/TestHelpers.java

Review comment:
       Good catch, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r568436222



##########
File path: api/src/test/java/org/apache/iceberg/AssertHelpers.java
##########
@@ -137,4 +139,16 @@ private static void handleException(String message,
       throw e;
     }
   }
+
+  /**
+   * A convenience method to assert the cause of thrown exception.

Review comment:
       Javadoc looks wrong, cut and paste error?

##########
File path: parquet/src/test/java/org/apache/iceberg/TestHelpers.java
##########
@@ -85,4 +87,17 @@ private static void handleException(String message,
       throw e;
     }
   }
+
+  /**
+   * A convenience method to assert the cause of thrown exception.

Review comment:
       Same here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-756723017


   @shardulm94 I'll pick this up again since I'll be convenient for #1972, this will make `Schema` serializable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue edited a comment on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue edited a comment on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-805229827


   Looks like this would fix #1654 so we should get it into the 0.12.0 release. I don't think we should update Avro with a breaking change in a patch release so it isn't a good idea to try to get this into 0.11.1.
   
   @Fokko, if you have time please update to 1.10.2 and I'll merge. Otherwise, I'll merge this in a day or so and we can update again. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko edited a comment on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko edited a comment on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759304104


   Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2. Avro 1.10.1 uses Jackson 2.11, and 1.9 is still on 2.10.
   
   Spark 3.0.1 is still on Jackson 2.10:
   ```
   +--- org.apache.spark:spark-hive_2.12 -> 3.0.1
   |    +--- org.apache.spark:spark-core_2.12:3.0.1
   |    |    +--- com.fasterxml.jackson.module:jackson-module-scala_2.12:2.10.0 -> 2.10.2
   |    |    |    +--- org.scala-lang:scala-library:2.12.10
   |    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |    \--- com.fasterxml.jackson.module:jackson-module-paranamer:2.10.2
   |    |    |         +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |         \--- com.thoughtworks.paranamer:paranamer:2.8
   ```
   https://github.com/apache/spark/blob/v3.0.1/pom.xml#L175


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759304104


   Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r579886035



##########
File path: core/src/test/java/org/apache/iceberg/avro/TestReadProjection.java
##########
@@ -221,10 +221,10 @@ public void testNestedStructProjection() throws Exception {
     );
 
     projected = writeAndRead("latitude_only", writeSchema, latOnly, record);
-    projectedLocation = (Record) projected.get("location");
-    Assert.assertNull("Should not project id", projected.get("id"));
+    Record projectedLocation = (Record) projected.get("location");
+    AssertHelpers.assertEmptyAvroField(projected, "id");
     Assert.assertNotNull("Should project location", projected.get("location"));
-    Assert.assertNull("Should not project longitude", projectedLocation.get("long"));
+    AssertHelpers.assertEmptyAvroField(projected, "long");

Review comment:
       I think this should still use `projectedLocation`. Looks like it might just be a typo.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-875167472


   @davseitsev, we don't typically change dependency versions in patch releases. I think that's a surprise to people that depend on very few changes in patches. This would be a good one to get into 0.12.0 (release in the next couple weeks) if we can get tests passing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-748262031


   @Fokko I am also interested in getting Iceberg onto Avro 1.10.1. Are you still working on this?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r592235285



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetAvro.java
##########
@@ -169,12 +164,14 @@ public String getLogicalTypeName() {
 
     @Override
     public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) {
-      return super.fromFixed(value, schema, decimalsByScale[((ParquetDecimal) type).scale()]);
+      final ParquetDecimal dec = (ParquetDecimal) type;
+      return super.fromFixed(value, schema, LogicalTypes.decimal(dec.precision(), dec.scale()));

Review comment:
       I know, but we didn't take the scale into account, so it could actually change the scale. I fixed this in the `fromFixed` by directly creating the `BigDecimal`, this is what's happening in the Avro library:
   ```
       @Override
       public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) {
         int scale = ((LogicalTypes.Decimal) type).getScale();
         return new BigDecimal(new BigInteger(value.bytes()), scale);
       }
   ```
   For the `toFixed` this isn't that trivial, since this would mean copying a lot of code.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r594579827



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetAvro.java
##########
@@ -169,12 +164,14 @@ public String getLogicalTypeName() {
 
     @Override
     public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) {
-      return super.fromFixed(value, schema, decimalsByScale[((ParquetDecimal) type).scale()]);
+      final ParquetDecimal dec = (ParquetDecimal) type;
+      return super.fromFixed(value, schema, LogicalTypes.decimal(dec.precision(), dec.scale()));

Review comment:
       I think it's fine to delegate to `toFixed`. I just don't want to call `LogicalTypes.decimal` every time to create a new decimal `LogicalType` with the same scale and precision. I think just adding a cache keyed by a `Pair` of scale and precision would fix it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-805284872


   I think a Jackson dependency update in Avro breaks Spark:
   
   ```
   com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.11.4 requires Jackson Databind version >= 2.11.0 and < 2.12.0
   ```
   
   Looks like Avro updated Jackson from 2.11.3 to 2.12.2, which is what causes the problem. We should revert the latest update to 1.10.2 and use 1.10.1 to fix the problem, I guess.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r571874950



##########
File path: api/src/test/java/org/apache/iceberg/AssertHelpers.java
##########
@@ -137,4 +139,16 @@ private static void handleException(String message,
       throw e;
     }
   }
+
+  /**
+   * A convenience method to assert the cause of thrown exception.

Review comment:
       Good catch, thanks!

##########
File path: parquet/src/test/java/org/apache/iceberg/TestHelpers.java
##########
@@ -85,4 +87,17 @@ private static void handleException(String message,
       throw e;
     }
   }
+
+  /**
+   * A convenience method to assert the cause of thrown exception.

Review comment:
       Good catch, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-771435975


   @rdblue Sure thing!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759135813


   @Fokko I see the Spark tests are failing with this error
   ```
       java.lang.ExceptionInInitializerError
   
           Caused by:
           com.fasterxml.jackson.databind.JsonMappingException: Scala module 2.10.2 requires Jackson Databind version >= 2.10.0 and < 2.11.0
   ```
   
   This happens because Jackson expects `com.fasterxml.jackson.core` versions to match `com.fasterxml.jackson.module` and that doesn't seem to be the case here. I ran a gradle build scan and seems like `jackson-databind` version is being upgraded to 2.11 [because of Avro 1.10.0](https://issues.apache.org/jira/browse/AVRO-2853) which is causing this issue.
   [Gradle build scan link](https://scans.gradle.com/s/77u6s7hbyyqak/dependencies?configurationFilter=WyJ0ZXN0UnVudGltZUNsYXNzcGF0aCJd&dependencies=jackson&expandAll&focusedDependency=WzIxLDcsNzUsWzAsMCxbNl1dXQ&focusedDependencyView=versions&projectFilter=WyI6aWNlYmVyZy1zcGFyazMiXQ)
   
   I think we may need to bump Iceberg's jackson version in [`version.props`](https://github.com/apache/iceberg/blob/01fca3d0a3c5653fe5a4a2a88c29aeaffca33f1d/versions.props#L14) and in [`build.gradle`](https://github.com/apache/iceberg/blob/01fca3d0a3c5653fe5a4a2a88c29aeaffca33f1d/build.gradle#L97)
   
   @rdblue I am not a 100% sure if this is the right thing to do. I know Spark can be finicky about the Jackson version. Do you have any insights here?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-756723017


   @shardulm94 I'll pick this up again since I'll be convenient for #1972, this will make `Schema` serializable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r568436222



##########
File path: api/src/test/java/org/apache/iceberg/AssertHelpers.java
##########
@@ -137,4 +139,16 @@ private static void handleException(String message,
       throw e;
     }
   }
+
+  /**
+   * A convenience method to assert the cause of thrown exception.

Review comment:
       Javadoc looks wrong, cut and paste error?

##########
File path: parquet/src/test/java/org/apache/iceberg/TestHelpers.java
##########
@@ -85,4 +87,17 @@ private static void handleException(String message,
       throw e;
     }
   }
+
+  /**
+   * A convenience method to assert the cause of thrown exception.

Review comment:
       Same here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759139464


   @shardulm94, our Jackson version should be okay to update. We shade and relocate it in the runtime Jars to avoid conflicts with Spark.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] dongjoon-hyun commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r599151400



##########
File path: versions.props
##########
@@ -1,5 +1,5 @@
 org.slf4j:* = 1.7.25
-org.apache.avro:avro = 1.9.2
+org.apache.avro:avro = 1.10.1

Review comment:
       Hi, @Fokko .
   Could you update this PR with `1.10.2`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] massdosage commented on a change in pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
massdosage commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r551433064



##########
File path: api/src/test/java/org/apache/iceberg/AssertHelpers.java
##########
@@ -28,7 +30,8 @@ private AssertHelpers() {
   }
 
   /**
-   * A convenience method to avoid a large number of @Test(expected=...) tests
+   * A convenience method to avoid a large numb/test/java/org/apache/iceberg/TestHelpers.java

Review comment:
       This comment looks like the victim of an unintended search and replace.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-771203997


   @Fokko, could you rebase this? Now that the Jackson update is in, I think we can work on it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r579886741



##########
File path: parquet/src/main/java/org/apache/iceberg/parquet/ParquetAvro.java
##########
@@ -169,12 +164,14 @@ public String getLogicalTypeName() {
 
     @Override
     public BigDecimal fromFixed(GenericFixed value, Schema schema, LogicalType type) {
-      return super.fromFixed(value, schema, decimalsByScale[((ParquetDecimal) type).scale()]);
+      final ParquetDecimal dec = (ParquetDecimal) type;
+      return super.fromFixed(value, schema, LogicalTypes.decimal(dec.precision(), dec.scale()));

Review comment:
       This change results in creating a new `Decimal` for every value. While I don't think that we use this very much, I would rather not introduce object allocation for every conversion. Can you add a static cache for these? That was the purpose of `decimalsByScale`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko edited a comment on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko edited a comment on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759304104


   Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2. Avro 1.10.1 uses Jackson 2.11, and 1.9 is still on 2.10.
   
   Spark 3.0.1 is still on Jackson 2.10:
   ```
   +--- org.apache.spark:spark-hive_2.12 -> 3.0.1
   |    +--- org.apache.spark:spark-core_2.12:3.0.1
   |    |    +--- com.fasterxml.jackson.module:jackson-module-scala_2.12:2.10.0 -> 2.10.2
   |    |    |    +--- org.scala-lang:scala-library:2.12.10
   |    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |    \--- com.fasterxml.jackson.module:jackson-module-paranamer:2.10.2
   |    |    |         +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |         \--- com.thoughtworks.paranamer:paranamer:2.8
   ```
   https://github.com/apache/spark/blob/v3.0.1/pom.xml#L175
   
   Pulling a newer version into the classpath will create a mismatch between the Jackson versions.
   
   They've bumped Jackson in Spark 3.1, for exactly the issue that we're running into: https://github.com/apache/spark/commit/01b73ae6388279514d61c14a9dc9718a34dad465


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko edited a comment on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko edited a comment on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759304104


   Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2. Avro 1.10.1 uses Jackson 2.11, and 1.9 is still on 2.10.
   
   Spark 3.0.1 is still on Jackson 2.10:
   ```
   +--- org.apache.spark:spark-hive_2.12 -> 3.0.1
   |    +--- org.apache.spark:spark-core_2.12:3.0.1
   |    |    +--- com.fasterxml.jackson.module:jackson-module-scala_2.12:2.10.0 -> 2.10.2
   |    |    |    +--- org.scala-lang:scala-library:2.12.10
   |    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |    \--- com.fasterxml.jackson.module:jackson-module-paranamer:2.10.2
   |    |    |         +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |         \--- com.thoughtworks.paranamer:paranamer:2.8
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko edited a comment on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko edited a comment on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-759304104


   Not 100% sure if this is the case. The Jackson module in Avro isn't shaded, so that might cause issues with Spark 2. Avro 1.10.1 uses Jackson 2.11, and 1.9 is still on 2.10.
   
   Spark 3.0.1 is still on Jackson 2.10:
   ```
   +--- org.apache.spark:spark-hive_2.12 -> 3.0.1
   |    +--- org.apache.spark:spark-core_2.12:3.0.1
   |    |    +--- com.fasterxml.jackson.module:jackson-module-scala_2.12:2.10.0 -> 2.10.2
   |    |    |    +--- org.scala-lang:scala-library:2.12.10
   |    |    |    +--- com.fasterxml.jackson.core:jackson-core:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-annotations:2.10.2
   |    |    |    +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |    \--- com.fasterxml.jackson.module:jackson-module-paranamer:2.10.2
   |    |    |         +--- com.fasterxml.jackson.core:jackson-databind:2.10.2 (*)
   |    |    |         \--- com.thoughtworks.paranamer:paranamer:2.8
   ```
   https://github.com/apache/spark/blob/v3.0.1/pom.xml#L175
   
   Pulling a newer version into the classpath will create a mismatch between the Jackson versions.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-771435975


   @rdblue Sure thing!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r592222488



##########
File path: core/src/test/java/org/apache/iceberg/avro/TestReadProjection.java
##########
@@ -221,10 +221,10 @@ public void testNestedStructProjection() throws Exception {
     );
 
     projected = writeAndRead("latitude_only", writeSchema, latOnly, record);
-    projectedLocation = (Record) projected.get("location");
-    Assert.assertNull("Should not project id", projected.get("id"));
+    Record projectedLocation = (Record) projected.get("location");
+    AssertHelpers.assertEmptyAvroField(projected, "id");
     Assert.assertNotNull("Should project location", projected.get("location"));
-    Assert.assertNull("Should not project longitude", projectedLocation.get("long"));
+    AssertHelpers.assertEmptyAvroField(projected, "long");

Review comment:
       Good catch, probably a copy-paste!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a change in pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r556356944



##########
File path: versions.props
##########
@@ -1,5 +1,5 @@
 org.slf4j:* = 1.7.25
-org.apache.avro:avro = 1.9.2
+org.apache.avro:avro = 1.10.0

Review comment:
       I've bumped it to 1.10.1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] shardulm94 commented on a change in pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
shardulm94 commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r556033683



##########
File path: api/src/test/java/org/apache/iceberg/AssertHelpers.java
##########
@@ -29,6 +31,7 @@ private AssertHelpers() {
 
   /**
    * A convenience method to avoid a large number of @Test(expected=...) tests
+   * of @Test(expected=...) tests

Review comment:
       Remove this? Seems to be a remnant of the previous search/replace.

##########
File path: versions.props
##########
@@ -1,5 +1,5 @@
 org.slf4j:* = 1.7.25
-org.apache.avro:avro = 1.9.2
+org.apache.avro:avro = 1.10.0

Review comment:
       Can we bump this to Avro 1.10.1? I was specifically interested in https://issues.apache.org/jira/browse/AVRO-2817. I can also create a followup PR if you prefer and leave it at 1.10.0 here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a change in pull request #1648: Move to Apache Avro 1.10.0

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r553907182



##########
File path: api/src/test/java/org/apache/iceberg/AssertHelpers.java
##########
@@ -28,7 +30,8 @@ private AssertHelpers() {
   }
 
   /**
-   * A convenience method to avoid a large number of @Test(expected=...) tests
+   * A convenience method to avoid a large numb/test/java/org/apache/iceberg/TestHelpers.java

Review comment:
       Good catch, thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-782953035


   @Fokko, looks correct overall, although there is one error in a test case. I'd also like to fix the allocation problem the the fixed converter.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a change in pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
Fokko commented on a change in pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#discussion_r599931279



##########
File path: versions.props
##########
@@ -1,5 +1,5 @@
 org.slf4j:* = 1.7.25
-org.apache.avro:avro = 1.9.2
+org.apache.avro:avro = 1.10.1

Review comment:
       Done! Thanks for pointing out.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-805229827


   Looks like this would fix #1654 so we should get it into the 0.12.0 release. I don't think we should update Avro with a breaking change in a patch release so it isn't a good idea to try to get this into 0.11.1.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue merged pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue merged pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] davseitsev commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
davseitsev commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-868307715


   Is there any workaround for this issue? Are you going to release 0.11.2 with this fix?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1648: Move to Apache Avro 1.10.1

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1648:
URL: https://github.com/apache/iceberg/pull/1648#issuecomment-879209798


   Merged! Thanks for updating this, @Fokko!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org