You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Pratyaksh Sharma (Jira)" <ji...@apache.org> on 2020/04/24 21:08:00 UTC
[jira] [Created] (HUDI-837) Fix KafkaAvroSource to use the latest
schema for reading
Pratyaksh Sharma created HUDI-837:
-------------------------------------
Summary: Fix KafkaAvroSource to use the latest schema for reading
Key: HUDI-837
URL: https://issues.apache.org/jira/browse/HUDI-837
Project: Apache Hudi (incubating)
Issue Type: Improvement
Components: DeltaStreamer
Reporter: Pratyaksh Sharma
Assignee: Pratyaksh Sharma
Fix For: 0.6.0
Currently we specify KafkaAvroDeserializer as the value for value.deserializer in AvroKafkaSource. This implies the published record is read using the same schema with which it was written even though the schema got evolved in between. As a result, messages in incoming batch can have different schemas. This has to be handled at the time of actually writing records in parquet.
This Jira aims at providing an option to read all the messages with the same schema by implementing a new custom deserializer class.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)