You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Ryan Skraba <ry...@skraba.com> on 2019/08/26 09:31:41 UTC

Record names and schema evolution

Hello!  I've been going through some code that should be cleaned up if
https://issues.apache.org/jira/browse/AVRO-2492 is applied (removing
one of the deprecated record schema constructores).

In the meantime, I have a question about names in general.  I noticed
in the spec:

https://avro.apache.org/docs/1.9.0/spec.html#Schema+Resolution

<heavily snipped>
* To match, one of the following must hold:
  - both schemas are records with the same name
* If both are records;
  - <more criteria with respect to fields>

In 1.9.1, "the same name" was changed to "the same (unqualified) name"
(AVRO-2400)

For reading records, I have definitely observed that the reader and
writer schema can have different top-level record names and work
together successfully -- implying that the name isn't taken into
account at all.

Is the spec wrong, or the implementation?  Is this behaviour
consistent across named schemas?  I seem to recall that when resolving
a record against a union, the name is *preferred* if available.

All my best, Ryan

Re: Record names and schema evolution

Posted by Ryan Skraba <ry...@skraba.com>.
I mostly agree!

I would be very, very surprised if someone ever wrote code where the
correct behaviour relied on *failing* to resolve two schemas because
the spec says so!

I also think the Java implementation is doing the right and convenient
thing.  I've typically used new record names for new schemas
(AddressV1, AddressV2) and just expected it to work, and it does.

That being said -- in this case, there are no interoperability
concerns when schema evolution is concerned, since the rules are by
nature linked to one version of one implementation.   If there's value
in this part of the spec, it's because it would be *great* if all
implementations did roughly the same thing and we could point users to
this documentation!

Anyone have any knowledge about how closely the other implementations
follow the spec as written?  If everyone is ignoring record names with
direct record->record evolution, it would definitely be appropriate to
update the spec!

I'll take a look at the other named schemas (fixed, enum) and see if
they're consistent in this behaviour.

As it is, the change to include "the same (unqualified) name" seems
misleading if it was done for legacy Java reasons AND the name is
ignored when comparing two records...

All my best, Ryan








On Mon, Aug 26, 2019 at 7:47 PM Doug Cutting <cu...@gmail.com> wrote:
>
> This may be an example of Postel's Law, where neither the implementation
> nor the spec are wrong.  An implementation is allowed to accept more than
> the the strictest interpretation of the spec.  Within reason, we prefer
> that folks can read data rather than get an error when trying.  (We also
> want them to be able to write data which can be read by the widest range of
> implementations.)
>
> Does the likelihood of harm in quietly accepting mismatched namespaces
> exceed convenience and back-compatibility here?
>
> Cheers,
>
> Doug
>
> On Mon, Aug 26, 2019 at 2:31 AM Ryan Skraba <ry...@skraba.com> wrote:
>
> > Hello!  I've been going through some code that should be cleaned up if
> > https://issues.apache.org/jira/browse/AVRO-2492 is applied (removing
> > one of the deprecated record schema constructores).
> >
> > In the meantime, I have a question about names in general.  I noticed
> > in the spec:
> >
> > https://avro.apache.org/docs/1.9.0/spec.html#Schema+Resolution
> >
> > <heavily snipped>
> > * To match, one of the following must hold:
> >   - both schemas are records with the same name
> > * If both are records;
> >   - <more criteria with respect to fields>
> >
> > In 1.9.1, "the same name" was changed to "the same (unqualified) name"
> > (AVRO-2400)
> >
> > For reading records, I have definitely observed that the reader and
> > writer schema can have different top-level record names and work
> > together successfully -- implying that the name isn't taken into
> > account at all.
> >
> > Is the spec wrong, or the implementation?  Is this behaviour
> > consistent across named schemas?  I seem to recall that when resolving
> > a record against a union, the name is *preferred* if available.
> >
> > All my best, Ryan
> >

Re: Record names and schema evolution

Posted by Doug Cutting <cu...@gmail.com>.
This may be an example of Postel's Law, where neither the implementation
nor the spec are wrong.  An implementation is allowed to accept more than
the the strictest interpretation of the spec.  Within reason, we prefer
that folks can read data rather than get an error when trying.  (We also
want them to be able to write data which can be read by the widest range of
implementations.)

Does the likelihood of harm in quietly accepting mismatched namespaces
exceed convenience and back-compatibility here?

Cheers,

Doug

On Mon, Aug 26, 2019 at 2:31 AM Ryan Skraba <ry...@skraba.com> wrote:

> Hello!  I've been going through some code that should be cleaned up if
> https://issues.apache.org/jira/browse/AVRO-2492 is applied (removing
> one of the deprecated record schema constructores).
>
> In the meantime, I have a question about names in general.  I noticed
> in the spec:
>
> https://avro.apache.org/docs/1.9.0/spec.html#Schema+Resolution
>
> <heavily snipped>
> * To match, one of the following must hold:
>   - both schemas are records with the same name
> * If both are records;
>   - <more criteria with respect to fields>
>
> In 1.9.1, "the same name" was changed to "the same (unqualified) name"
> (AVRO-2400)
>
> For reading records, I have definitely observed that the reader and
> writer schema can have different top-level record names and work
> together successfully -- implying that the name isn't taken into
> account at all.
>
> Is the spec wrong, or the implementation?  Is this behaviour
> consistent across named schemas?  I seem to recall that when resolving
> a record against a union, the name is *preferred* if available.
>
> All my best, Ryan
>