You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Pratyaksh Sharma (Jira)" <ji...@apache.org> on 2020/04/24 21:08:00 UTC

[jira] [Created] (HUDI-837) Fix KafkaAvroSource to use the latest schema for reading

Pratyaksh Sharma created HUDI-837:
-------------------------------------

             Summary: Fix KafkaAvroSource to use the latest schema for reading
                 Key: HUDI-837
                 URL: https://issues.apache.org/jira/browse/HUDI-837
             Project: Apache Hudi (incubating)
          Issue Type: Improvement
          Components: DeltaStreamer
            Reporter: Pratyaksh Sharma
            Assignee: Pratyaksh Sharma
             Fix For: 0.6.0


Currently we specify KafkaAvroDeserializer as the value for value.deserializer in AvroKafkaSource. This implies the published record is read using the same schema with which it was written even though the schema got evolved in between. As a result, messages in incoming batch can have different schemas. This has to be handled at the time of actually writing records in parquet. 

This Jira aims at providing an option to read all the messages with the same schema by implementing a new custom deserializer class. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)