You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by "kazdy (via GitHub)" <gi...@apache.org> on 2023/03/03 11:48:30 UTC

[GitHub] [hudi] kazdy commented on issue #8018: [SUPPORT] why is the schema evolution done while not setting hoodie.schema.on.read.enable

kazdy commented on issue #8018:
URL: https://github.com/apache/hudi/issues/8018#issuecomment-1453414031

   @danny0405 isn't it that with `hoodie.schema.on.read.enable=false` Hudi fallbacks on the default "out of the box" schema evolution that uses avro schema resolution (which allows for new columns to be added at the end of the schema)?
   
   When it comes to reconciling schema, when I was playing with it in 0.10 and 0.11 it was allowing wider schemas on write, but when incoming cols were missing then these were added to match "current" target schema. 
   So for me reconciling schema worked like this:
   wider schema -> accept as new table schema
   missing columns -> add missing columns to match current table schema
   
   There's a hacky way to prevent schema evolution.
   One can get schema from metastore or from file containing avro schema definition for the table, then read it in your spark job and pass it to https://hudi.apache.org/docs/configurations#hoodiewriteschema which should overwrite the writer schema and effectively drop new columns.
   
   Again, MERGE INTO stmt enforces the target table schema when writing records.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org