You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/07 12:03:43 UTC

[GitHub] [hudi] nsivabalan commented on pull request #2012: [HUDI-1129] Deltastreamer Add support for schema evolution

nsivabalan commented on pull request #2012:
URL: https://github.com/apache/hudi/pull/2012#issuecomment-834312959


   > I spent sometime to understand this PR. thanks for putting it up @sathyaprakashg. I have few clarifications.
   > 
   > 1. Can you fix the description wrt latest status. I don't see SchemaBasedSchemaProvider etc.
   > 2. FYI We landed a [fix](https://github.com/apache/hudi/pull/2765) wrt default vals and null in unions. If incase, the schema post processing is not required at all w/ this fix, it would simplify things. Guess the namespace fix in this PR may not be required if the post processing step is not required. @bvaradar @n3nash : can you folks chime in here please. another related [fixed datatype jira](https://issues.apache.org/jira/browse/HUDI-1607). the backwards incompatibility may not be an issue if we go this route. 
   > 3. Also, I pulled the test locally and was trying to verify things. Looks like the test is not generating records as intended in 3rd step. Here is what is happening.
   >    
   >    * TestDataSource generates data w/ intended schema(old)
   >    * But in SourceFormatAdapter, when we do AvroConversionUtils.createDataFrame(...), evolved schema is passed in. and so InputBatch<Dataset> returned from here has new column set to null for all records.
   >    * I also verified this from within the IdentityTransformer which was showing evolved schema and record having new column as well.
   >      so, essentially the test also need to be fixed.
   
   @vinothchandar : We need to iron out the perf issue. But these were my comments earlier. it could simplify the backwards compatibility issue which was being discussed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org