You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/10/15 08:10:37 UTC

[GitHub] [hudi] parisni commented on issue #6558: [SUPPORT] Parquet/Avro schema mismatch: Avro field not found

parisni commented on issue #6558:
URL: https://github.com/apache/hudi/issues/6558#issuecomment-1279691815

   Thanks @alexeykudinkin this clarifies. can you confirm the avro schemas are coming from the last commit in the timeline ? 
   
   Also the reason our field changed case over time has been identified. The hive metastore is case insensitive, so when you populate it with upper case, it returns lower case. However when spark reads for the first time a metastore table, it infers the schema from the parquet files and feeds the metastore with properties which are case sensitive. Afterwards, spark reads those properties within the metastore. When properties and hive information diverge (in case of schema evolution) then spark fallback to only read the hive information and leave the properties. This leads to suddenly have lower case columns. Which ultimately breaks hudi has we got in this issue.
   
   Eventuallly, we had to recreate the table from scratch. Now we avoid to feed the properties by making spark read-only access to the metastore.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org