You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "TengHuo (via GitHub)" <gi...@apache.org> on 2023/02/02 06:31:55 UTC

[GitHub] [hudi] TengHuo commented on pull request #7307: [HUDI-5271] fix issue inconsistent reader and writer schema in HoodieAvroDataBlock

TengHuo commented on PR #7307:
URL: https://github.com/apache/hudi/pull/7307#issuecomment-1413220811

   Agree @danny0405, think it's better we unify Avro schema handling across Spark and Flink in Hudi.
   
   Currently, we have Avro schema tools class `org.apache.hudi.avro.AvroSchemaUtils` in module `hudi-common` to manipulate Avro schema. And Hudi Spark is using `org.apache.spark.sql.avro.SchemaConverters` to do conversion between Spark DataType and Avro schema. Hudi Flink is using ` org.apache.hudi.util.AvroSchemaConverter` to do conversion between Flink DataType and Avro schema.
   
   I noticed that there is different behaviour when setting the name of a new Avro schema.
   
   **In Spark side**, it exposes the name and namespace of Avro schema as method parameter.
   
   ```java
     /**
      * Converts a Spark SQL schema to a corresponding Avro schema.
      *
      * @since 2.4.0
      */
     def toAvroType(catalystType: DataType,
                    nullable: Boolean = false,
                    recordName: String = "topLevelRecord",
                    nameSpace: String = ""): Schema
   ```
   
   reference: https://github.com/apache/hudi/blob/41653fc708854828bacb23ed624ca6b3a67d6737/hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/spark/sql/avro/SchemaConverters.scala#L154
   
   **In Flink side**, it uses a constant name `"record"`
   
   ```java
     /**
      * Converts Flink SQL {@link LogicalType} (can be nested) into an Avro schema.
      *
      * <p>Use "record" as the type name.
      *
      * @param schema the schema type, usually it should be the top level record type, e.g. not a
      *               nested type
      * @return Avro's {@link Schema} matching this logical type.
      */
     public static Schema convertToSchema(LogicalType schema) {
       return convertToSchema(schema, "record");
     }
   ```
   
   reference: https://github.com/apache/hudi/blob/8ffcb2fc9470077bdcf3810756545d081fb6523c/hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/util/AvroSchemaConverter.java#L202 
   
   (please correct me if I'm wrong)
   
   May I know if it is possible we unify all non engine related schemas things in one place? e.g. name conversion rule


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org