You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Niels Basjes <Ni...@basjes.nl> on 2017/02/01 13:45:47 UTC

Re: [IDEA] Making schema evolution for enums slightly easier.

Thanks for the idea.
I'm gonna play around with that to see if it could work.

Niels

On Tue, Jan 31, 2017 at 5:57 PM, Ryan Blue <rb...@netflix.com.invalid>
wrote:

> If you want to solve this problem by using a String to encode the value,
> then you can do that by defining a logical type that is an enum-as-string.
> But I'm not sure you want to do that. The nice thing about an enum is that
> you use what you know about the schema ahead of time to get a much more
> compact representation -- usually a byte rather than encoding the entire
> string. So I'd much rather find a way of handling this case that keeps the
> compact representation, while allowing for applications to gracefully
> handling these.
>
> For generic, enum symbols are translated to GenericEnumSymbol, which can
> hold any symbol. Adding an option to return the symbol from the writer's
> schema even if it isn't in the reader's schema is one way around the
> problem. That wouldn't work for reflect or specific, though.
>
> Another option that was suggested last year is to designate a catch-all
> enum symbol. So your enum would be { 'A', 'B', 'UNKNOWN' } and { 'A', 'B',
> 'C', 'UNKNOWN' }. When a v1 consumer reads v2 records, C gets turned into
> UNKNOWN.
>
> I like the designated catch-all symbol because it is a reasonable way to
> opt-in for forward-compatibility.
>
> rb
>
> On Tue, Jan 31, 2017 at 2:04 AM, Niels Basjes <Ni...@basjes.nl> wrote:
>
> > Hi,
> >
> > I'm working on a project where we are putting message serialized avro
> > records into Kafka. The schemas are made available via a schema registry
> of
> > some sorts.
> > Because Kafka stores the messages for a longer period 'weeks' we have two
> > common scenarios that occur when a new version of the schema is
> introduced
> > (i.e. from V1 to V2).
> >
> > 1) A V2 producer is released and a V1 consumer must be able to read the
> > records.
> > 2) A 'new' V2 consumer is released a few days after the V2 producer
> started
> > creating records. The V2 consumer starts reading Kafka "from the
> beginning"
> > and as a consequence first has to go through a set of V1 records.
> >
> > So in this usecase we need schema evolution in two directions.
> >
> > To make sure it all works as expected I did some experiments and found
> that
> > these requirements are all doable except when you are in need of an enum.
> >
> > This 'two directions' turns out to have a problem with changing the
> values
> > of an enum.
> >
> > You cannot write an enum { 'A', 'B', 'C' } and then read it with the
> schema
> > enum { 'A', 'B' }
> >
> >
> > So I was thinking about a possible way to make this easier for the
> > developer.
> >
> > The current idea that I want your opinion on:
> > 1) In the IDL we add a way of directing that we want the enum to be
> stored
> > in a different way in the schema. I was thinking about something like
> > either defining a new type like 'string enum' or perhaps use an
> annotation
> > of some sorts.
> > 2) The 'string enum' is mapped into the actual schema as a string (which
> > can contain ANY value). So anyone using the json schema can simply read
> it
> > because it is a string.
> > 3) The generated code that is used to set/change the value enforces that
> > only the allowed values can be set.
> >
> > This way a 'reader' can read any value, the schema is compatible in all
> > directions.
> >
> > What do you guys think?
> > Is this an idea worth trying out?
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
> >
>
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>



-- 
Best regards / Met vriendelijke groeten,

Niels Basjes