You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Apache Spark (Jira)" <ji...@apache.org> on 2021/09/15 22:14:00 UTC

[jira] [Assigned] (SPARK-34378) Support extra optional Avro fields in AvroSerializer

     [ https://issues.apache.org/jira/browse/SPARK-34378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-34378:
------------------------------------

    Assignee: Apache Spark

> Support extra optional Avro fields in AvroSerializer
> ----------------------------------------------------
>
>                 Key: SPARK-34378
>                 URL: https://issues.apache.org/jira/browse/SPARK-34378
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 3.0.1
>            Reporter: Erik Krogen
>            Assignee: Apache Spark
>            Priority: Major
>
> Currently, when writing out Avro data using a custom schema ({{avroSchema}}), if there are any extra Avro fields which do not have a matching field in the Catalyst schema, the serialization will fail. This is much more strict than on the deserialization path, where Avro fields not present in the Catalyst schema are ignored, and Catalyst fields not present in the Avro schema are allowed as long as they are nullable. I believe it will be more user-friendly if extra Avro fields are allowed, as long as they are optional. This makes it easier for users to write out data with Avro schemas which may be outside of their control.
> If there is concern about the safety of this approach (i.e. there are use cases where users want strict matching), we can make it configurable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org