You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Oscar Westra van Holthe - Kind (Jira)" <ji...@apache.org> on 2022/07/21 11:19:00 UTC

[jira] [Comment Edited] (AVRO-2918) Schema polymorphism

    [ https://issues.apache.org/jira/browse/AVRO-2918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17569357#comment-17569357 ] 

Oscar Westra van Holthe - Kind edited comment on AVRO-2918 at 7/21/22 11:18 AM:
--------------------------------------------------------------------------------

For polymorphism, as I see it, there are basically two options:
 # Limit polymorphism to schema definitions. This means that for arrays, one must manually make a union for the elements.
Pro: encoding & decoding need not change
Con: creating these unions is cumbersome and error prone
 # Use a {_}discriminator{_}: a fields whose field value determines which subclass to use.
(the discriminator field must come before subschema fields, and all subschemas must be known; effectively sealing the the schema against later extensions)
Pro: no explicit union needed
Con: encoding & decoding changes 

 

The option of (silently) dropping fields that don't fit the parent type is not a valid option IMHO, as it changes the data.

 

I like the discriminator option, but it does involve quite a bit of work.

First of all, because the discriminator field decides which subschema to use, is that we must create a schema {*}set{*}: a schema with all subschemas in a single definition.

Next is that such a change in definition updates not just the spec, but also affects all schema parsing, and encoding/decoding values. This is quite a big change.

 

All in all, it may be more effective to create a union field containing one of several records, each containing the unique fields of a subschema.


was (Author: opwvhk):
For polymorphism, as I see it, there are basically two options:
 # Limit polymorphism to schema definitions. This means that for arrays, one must manually make a union for the elements.
Pro: encoding & decoding need not change
Con: creating these unions is cumbersome and error prone
 # Use a {_}discriminator{_}: a fields whose field value determines which subclass to use.
(the discriminator field must come before subschema fields, and all subschemas must be known; effectively sealing the the schema against later extensions)
Pro: no explicit union needed
Con: encoding & decoding changes 

 

The option of (silently) dropping fields that don't fit the parent type is not a valid option IMHO, as it changes the data.

> Schema polymorphism
> -------------------
>
>                 Key: AVRO-2918
>                 URL: https://issues.apache.org/jira/browse/AVRO-2918
>             Project: Apache Avro
>          Issue Type: New Feature
>          Components: logical types, misc, spec
>            Reporter: Jonathan Rapoport
>            Priority: Critical
>              Labels: features
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> Include the option to use named types as base types for a new schema. Allow for MRO generation. Field inheritance. 
> The benefits of this approach include:
>  * Defining a schema as validation for a certain wire, and so allowing the receiver to be certain of the structure of the data (this works today). However, defining an extension of this schema, or certain schemas which can be normalized to the original schema, but contain additional information, will not allow it to be sent over the same wire.
>  * Backwards compatibility through inheritance - you never break the old schema, thus allowing a long integration period, with no need to recode all processes familiar with the schema. The new schema will simply inherit the old one, and only add information.
>  * Allow for full data control through polymorphism, and the ability to replace structures within any supported language. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)