You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "sivabalan narayanan (Jira)" <ji...@apache.org> on 2021/02/06 20:41:00 UTC

[jira] [Updated] (HUDI-837) Fix AvroKafkaSource to use the latest schema for reading

     [ https://issues.apache.org/jira/browse/HUDI-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

sivabalan narayanan updated HUDI-837:
-------------------------------------
    Labels: pull-request-available sev:critical user-support-issues  (was: pull-request-available user-support-issues)

> Fix AvroKafkaSource to use the latest schema for reading
> --------------------------------------------------------
>
>                 Key: HUDI-837
>                 URL: https://issues.apache.org/jira/browse/HUDI-837
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Pratyaksh Sharma
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>              Labels: pull-request-available, sev:critical, user-support-issues
>             Fix For: 0.8.0
>
>
> Currently we specify KafkaAvroDeserializer as the value for value.deserializer in AvroKafkaSource. This implies the published record is read using the same schema with which it was written even though the schema got evolved in between. As a result, messages in incoming batch can have different schemas. This has to be handled at the time of actually writing records in parquet. 
> This Jira aims at providing an option to read all the messages with the same schema by implementing a new custom deserializer class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)