You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Darryl Green (JIRA)" <ji...@apache.org> on 2017/02/09 08:50:42 UTC

[jira] [Commented] (AVRO-1340) use default to allow old readers to specify default enum value when encountering new enum symbols

    [ https://issues.apache.org/jira/browse/AVRO-1340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859221#comment-15859221 ] 

Darryl Green commented on AVRO-1340:
------------------------------------

An enum (enumerated type) is, type theoretically, a tagged union of unit types. Interestingly, in Avro, the binary representation of
{code:javascript|title=enum}
{
  “type”: “enum”,
  “name”: “Suit”,
  “symbols”: ["CLUBS", "HEARTS", “SPADES”, “DIAMONDS”]
}
{code}
and
{code:javascript|title=union of unit types}
[ 
{ "type": "record", "name": "CLUBS", "fields": [] },
{ "type": "record", "name": "HEARTS", "fields": [] },
{ "type": "record", "name": "SPADES", "fields": [] },
{ "type": "record", "name": "DIAMONDS", "fields": [] }
]
{code}

is the same.

This gives a way (when using the union representation) to extend the enumeration and map new symbols (record names) to an existing one IF the reader schema is updated with aliases for the new symbols - but then if the reader can be updated, why not just update it to be the new enum schema...

Which is where the suggestion in [AVRO-1347] to improve name and alias matching for named schema (by allowing/using aliases in the WRITER schema) and which has been noted to allow unions in general to be extended starts to look like [~mtth] suggestion to have (writer) aliases for enum symbols and suggests a way to spell (in a writer schema) the name of a symbol that has a substitution (alias) to use if the reader schema does not include the symbol:

An enum schema symbols list is represented as:  [<symbol>,<symbol>..,<symbol>]  where <symbol> is a string. If <symbol> were allowed to be an object having a name and an (optional) aliases field one could do this: 

{code:javascript|title=enum with aliases}
{
  “type”: “enum”,
  “name”: “Suit”,
  “symbols”: ["CLUBS", "HEARTS", “SPADES”, “DIAMONDS”, {"name": "JOKERS", "aliases": ["CLUBS"] } ]
}
{code}

in a new writer schema that adds jokers as a distinct suite but, with a fallback so that if a joker is ever read by an old reader, it will be treated as though it were clubs...

It seems obvious that the symbols are the (unit type) fields of the enum "union" and the syntax and semantics for extending should be the same as any other union - and the fact that unions are not extensibe is the same problem (and could reasonably be fixed the same way i.e. allowing writer alias to be used when resolving).


> use default to allow old readers to specify default enum value when encountering new enum symbols
> -------------------------------------------------------------------------------------------------
>
>                 Key: AVRO-1340
>                 URL: https://issues.apache.org/jira/browse/AVRO-1340
>             Project: Avro
>          Issue Type: Improvement
>          Components: spec
>         Environment: N/A
>            Reporter: Jim Donofrio
>            Priority: Minor
>
> The schema resolution page says:
> > if both are enums:
> > if the writer's symbol is not present in the reader's enum, then an
> error is signalled.
> This makes it difficult to use enum's because you can never add a enum value and keep old reader's compatible. Why not use the default option to refer to one of enum values so that when a old reader encounters a enum ordinal it does not recognize, it can default to the optional schema provided one. If the old schema does not provide a default then the older reader can continue to fail as it does today.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)