You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "xiarixiaoyao (via GitHub)" <gi...@apache.org> on 2023/02/24 02:28:22 UTC

[GitHub] [hudi] xiarixiaoyao commented on a diff in pull request #8026: [HUDI-5835] After performing the update operation, the hoodie table cannot be read normally by spark

xiarixiaoyao commented on code in PR #8026:
URL: https://github.com/apache/hudi/pull/8026#discussion_r1116437743


##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieBaseRelation.scala:
##########
@@ -155,12 +158,13 @@ abstract class HoodieBaseRelation(val sqlContext: SQLContext,
       }
     }
 
+    val avroNameAndSpace = AvroConversionUtils.getAvroRecordNameAndNamespace(tableName)
     val avroSchema = internalSchemaOpt.map { is =>
-      AvroInternalSchemaConverter.convert(is, "schema")
+      AvroInternalSchemaConverter.convert(is, avroNameAndSpace._2 + "." + avroNameAndSpace._1)

Review Comment:
   @alexeykudinkin   thanks for your review.
   1)  schema evolution  has nothing to do with this scene,since schema evolution will call HoodieAvroUtils.rewriteRecordWithNewSchema to uinfy namespace.    i change this line  just want to ensure that the namespace of reading schema and writing schema are consistent.
   2) The namespace of the schema used by hudi when writing the log is from tableName,  but  the namespace of read schema is “schema"
   3) When the schema evolution is not enabled,For decimal types, different namespaces produce different names, avro is name sensitive.   we should keep the read schema and write schema has the same namespace  just as previous versions of hudi 
   eg: 
   ff decimal(38, 10)
   hudi log write schema will be : {"name":"ff","type":[{"type":"fixed","name":"fixed","namespace":"hoodie.h0.h0_record.ff","size":16,"logicalType":"decimal","precision":38,"scale":10}
   
   spark read schema will be 
   ff type is : "name":"ff","type":[{"type":"fixed","name":"fixed","namespace":"Record.ff","size":16,"logicalType":"decimal","precision":38,"scale":10},"null"]}
   
   the read schema and  write schema is  incompatible, we cannot use read schema to read log。previous versions of hudi  does not have this problem
   
   
   
   Caused by: org.apache.avro.AvroTypeException: Found hoodie.h0.h0_record.ff.fixed, expecting union
   	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:308)
   	at org.apache.avro.io.parsing.Parser.advance(Parser.java:86)
   	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:275)
   	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:188)
   	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
   	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:260)
   	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:248)
   	at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:180)
   	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:161)
   	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:154)
   	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:201)
   	at org.apache.hudi.common.table.log.block.HoodieAvroDataBlock$RecordIterator.next(HoodieAvroDataBlock.java:149)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org