You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "gszadovszky (via GitHub)" <gi...@apache.org> on 2023/05/02 08:32:27 UTC

[GitHub] [parquet-mr] gszadovszky commented on a diff in pull request #1078: PARQUET-2292: Default SpecificRecord model reflects from MODEL$ field

gszadovszky commented on code in PR #1078:
URL: https://github.com/apache/parquet-mr/pull/1078#discussion_r1182232703


##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
     }
   }
 
+  /**
+   * Returns the specific data model for a given SpecificRecord schema by reflecting the underlying
+   * Avro class's `MODEL$` field, or Null if the class is not on the classpath or reflection fails.
+   */
+  static SpecificData getModelForSchema(Schema schema) {
+    final Class<?> clazz;
+
+    if (schema != null && (schema.getType() == Schema.Type.RECORD || schema.getType() == Schema.Type.UNION)) {
+      clazz = SpecificData.get().getClass(schema);
+    } else {
+      return null;
+    }
+
+    final SpecificData model;
+    try {
+      final Field modelField = clazz.getDeclaredField("MODEL$");
+      modelField.setAccessible(true);
+
+      model = (SpecificData) modelField.get(null);
+    } catch (Exception e) {
+      return null;
+    }
+
+    try {
+      final String avroVersion = Schema.Parser.class.getPackage().getImplementationVersion();
+      // Avro 1.8 doesn't include conversions in the MODEL$ field
+      if (avroVersion.startsWith("1.8.")) {
+        final Field conversionsField = clazz.getDeclaredField("conversions");
+        conversionsField.setAccessible(true);
+
+        final Conversion<?>[] conversions = (Conversion<?>[]) conversionsField.get(null);
+        Arrays.stream(conversions).filter(Objects::nonNull).forEach(model::addLogicalTypeConversion);
+      }
+    } catch (Exception e) {

Review Comment:
   Similar to the previous comment about logging and catching specific exceptions.



##########
parquet-avro/src/test/java/org/apache/parquet/avro/TestSpecificReadWrite.java:
##########
@@ -237,6 +242,43 @@ public void testAvroReadSchema() throws IOException {
     }
   }
 
+  @Test

Review Comment:
   We need to test both of `getModelForSchema` related to the avro version. If the version check gets more complicated, maybe more versions are to cover.



##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
     }
   }
 
+  /**
+   * Returns the specific data model for a given SpecificRecord schema by reflecting the underlying
+   * Avro class's `MODEL$` field, or Null if the class is not on the classpath or reflection fails.
+   */
+  static SpecificData getModelForSchema(Schema schema) {

Review Comment:
   TBH I do not have a strong opinion on any. I am fine with the current one if it works.



##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
     }
   }
 
+  /**
+   * Returns the specific data model for a given SpecificRecord schema by reflecting the underlying
+   * Avro class's `MODEL$` field, or Null if the class is not on the classpath or reflection fails.
+   */
+  static SpecificData getModelForSchema(Schema schema) {
+    final Class<?> clazz;
+
+    if (schema != null && (schema.getType() == Schema.Type.RECORD || schema.getType() == Schema.Type.UNION)) {
+      clazz = SpecificData.get().getClass(schema);
+    } else {
+      return null;
+    }
+
+    final SpecificData model;
+    try {
+      final Field modelField = clazz.getDeclaredField("MODEL$");
+      modelField.setAccessible(true);
+
+      model = (SpecificData) modelField.get(null);
+    } catch (Exception e) {
+      return null;
+    }
+
+    try {
+      final String avroVersion = Schema.Parser.class.getPackage().getImplementationVersion();
+      // Avro 1.8 doesn't include conversions in the MODEL$ field
+      if (avroVersion.startsWith("1.8.")) {

Review Comment:
   Since we are using reflections on private members there are no compatibility guarantees. We shall be very careful here. What about avro versions prior to 1.8? Also, what if it breaks in the future? Will the related unit test fail for a future Avro releases (in case of upgrading the Avro version in the pom)?



##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
     }
   }
 
+  /**
+   * Returns the specific data model for a given SpecificRecord schema by reflecting the underlying
+   * Avro class's `MODEL$` field, or Null if the class is not on the classpath or reflection fails.
+   */
+  static SpecificData getModelForSchema(Schema schema) {
+    final Class<?> clazz;
+
+    if (schema != null && (schema.getType() == Schema.Type.RECORD || schema.getType() == Schema.Type.UNION)) {
+      clazz = SpecificData.get().getClass(schema);
+    } else {
+      return null;
+    }
+
+    final SpecificData model;
+    try {
+      final Field modelField = clazz.getDeclaredField("MODEL$");
+      modelField.setAccessible(true);
+
+      model = (SpecificData) modelField.get(null);
+    } catch (Exception e) {

Review Comment:
   I think, we should log this exception. I would be also nice to use specific exceptions instead of catching everything. WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@parquet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org