You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gianluca Amori (JIRA)" <ji...@apache.org> on 2019/04/18 10:57:00 UTC
[jira] [Created] (SPARK-27506) Function `from_avro` doesn't allow
deserialization of data using other compatible schemas
Gianluca Amori created SPARK-27506:
--------------------------------------
Summary: Function `from_avro` doesn't allow deserialization of data using other compatible schemas
Key: SPARK-27506
URL: https://issues.apache.org/jira/browse/SPARK-27506
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 2.4.1
Reporter: Gianluca Amori
SPARK-24768 and subtasks introduced support to read and write Avro data by parsing a binary column of Avro format and converting it into its corresponding catalyst value (and viceversa).
The current implementation has the limitation of requiring deserialization of an event with the exact same schema with which it was serialized. This breaks one of the most important features of Avro, schema evolution [https://docs.confluent.io/current/schema-registry/avro.html] - most importantly, the ability to read old data with a newer (compatible) schema without breaking the consumer.
The GenericDatumReader in the Avro library already supports passing an optional *writer's schema* (the schema with which the record was serialized) alongside a mandatory *reader's schema* (the schema with which the record is going to be deserialized). The proposed change is to do the same in the from_avro function, allowing the possibility to pass an optional writer's schema to be used in the deserialization.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org