You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2023/05/01 18:08:00 UTC

[jira] [Commented] (PARQUET-2292) Improve default SpecificRecord model selection for Avro{Write,Read}Support

    [ https://issues.apache.org/jira/browse/PARQUET-2292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17718305#comment-17718305 ] 

ASF GitHub Bot commented on PARQUET-2292:
-----------------------------------------

clairemcginty commented on code in PR #1078:
URL: https://github.com/apache/parquet-mr/pull/1078#discussion_r1181768008


##########
parquet-avro/src/main/java/org/apache/parquet/avro/AvroRecordConverter.java:
##########
@@ -169,6 +172,46 @@ public void add(Object value) {
     }
   }
 
+  /**
+   * Returns the specific data model for a given SpecificRecord schema by reflecting the underlying
+   * Avro class's `MODEL$` field, or Null if the class is not on the classpath or reflection fails.
+   */
+  static SpecificData getModelForSchema(Schema schema) {

Review Comment:
   hi @gszadovszky ! Is there anything I can do to improve this PR?





> Improve default SpecificRecord model selection for Avro{Write,Read}Support
> --------------------------------------------------------------------------
>
>                 Key: PARQUET-2292
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2292
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: Claire McGinty
>            Assignee: Claire McGinty
>            Priority: Major
>
> AvroWriteSupport/AvroReadSupport can improve the precision of their default `model` selection. Currently they default to new SpecificDataSupplier().get()[0]. This means that SpecificRecord classes that contain logical types will fail out-of-the-box unless a specific DATA_SUPPLIER is configured that contains logical type conversions.
> I think we can improve this and make logical types work by default by defaulting to the value of the `MODEL$` field that every SpecificRecordBase implementation contains, which already contains all the logical conversions for that Avro type. It would require reflection, but that's what the Avro library is already doing to fetch models for Specific types[1].
>  
> [0] [https://github.com/apache/parquet-mr/blob/d38044f5395494e1543581a4b763f624305d3022/parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java#L403-L407]
> [1] https://github.com/apache/avro/blob/release-1.11.1/lang/java/avro/src/main/java/org/apache/avro/specific/SpecificData.java#L76-L86



--
This message was sent by Atlassian Jira
(v8.20.10#820010)