You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Vinoth Chandar (Jira)" <ji...@apache.org> on 2021/01/21 06:12:00 UTC

[jira] [Updated] (HUDI-837) Fix AvroKafkaSource to use the latest schema for reading

     [ https://issues.apache.org/jira/browse/HUDI-837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vinoth Chandar updated HUDI-837:
--------------------------------
    Fix Version/s:     (was: 0.7.0)
                   0.8.0

> Fix AvroKafkaSource to use the latest schema for reading
> --------------------------------------------------------
>
>                 Key: HUDI-837
>                 URL: https://issues.apache.org/jira/browse/HUDI-837
>             Project: Apache Hudi
>          Issue Type: Improvement
>          Components: DeltaStreamer
>            Reporter: Pratyaksh Sharma
>            Assignee: Pratyaksh Sharma
>            Priority: Major
>              Labels: bug-bash-0.6.0, pull-request-available
>             Fix For: 0.8.0
>
>
> Currently we specify KafkaAvroDeserializer as the value for value.deserializer in AvroKafkaSource. This implies the published record is read using the same schema with which it was written even though the schema got evolved in between. As a result, messages in incoming batch can have different schemas. This has to be handled at the time of actually writing records in parquet. 
> This Jira aims at providing an option to read all the messages with the same schema by implementing a new custom deserializer class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)