You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Jeff Kolesky <je...@gmail.com> on 2012/11/17 00:37:52 UTC

Check for schema backwards compatibility

In AVRO-816, there is an implementation of a method that will check to see
if one schema subsumes another based on the field definitions.  I would
like a tool that can check if one schema is backwards compatible with
another -- that a record written with schema version 1 can be read with
schema version 2.

For instance, let's say I had a schema for a person record that originally
looked like this:

{
  "type": "record",
  "name": "Person",
  "fields": [
    {"name": "username", "type": "string"},
    {"name": "password", "type": "string"},
    {"name": "joined_on", "type": "long"}
  ]
}

I want to change the schema by adding a field.  I know that this will be a
backwards compatible change as long as the field has a default set, but I
would like to have a tool that can verify this for me as a type of static
analysis.

Has there been discussion of the need fot this type of tool?  Would other
people find it useful?

Thanks.

Jeff

Re: Check for schema backwards compatibility

Posted by Doug Cutting <cu...@apache.org>.
Jeff,

I would not oppose such a re-factoring so long as these nested classes
do not become visible outside of the package and compatibility is
retained, but nor is it a high priority for me.

I don't think breaking these nested classes into their own files would
even break binary compatibility.  There should be no references to
these classes outside of the avro jar.

Cheers,

Doug

On Mon, Nov 19, 2012 at 11:12 AM, Jeff Kolesky <je...@kolesky.com> wrote:
> Schema.java is an unfortunately large file.  Would it be a reasonable
> refactor (of course done as a separate unit of work) to pull out the nested
> classes of Schema children (NamedSchema, RecordSchema, ArraySchema, etc)
> into their own files as package protected classes?  It will make them more
> accessible than they are now as private classes, but it would allow the
> Schema.java file to be a more manageable size.
>
> Jeff
>
> On Mon, Nov 19, 2012 at 10:54 AM, Doug Cutting <cu...@apache.org> wrote:
>
>> I don't feel strongly: a method on Schema would be fine with me as
>> would an auxiliary tool class.  Schema.java is a huge file already,
>> but I'm not sure that really causes any problems.
>>
>> On Sat, Nov 17, 2012 at 8:04 PM, Jeff Kolesky <je...@kolesky.com> wrote:
>> > Would it be appropriate to add this method to the Schema class itself in
>> > the same way `subsume` and `unify` were, or would you rather see a
>> separate
>> > tool, similar to SchemaNormalization?
>> >
>> > On Fri, Nov 16, 2012 at 3:54 PM, Doug Cutting <cu...@apache.org>
>> wrote:
>> >
>> >> On Fri, Nov 16, 2012 at 3:37 PM, Jeff Kolesky <je...@gmail.com>
>> >> wrote:
>> >> > Has there been discussion of the need fot this type of tool?  Would
>> other
>> >> > people find it useful?
>> >>
>> >> I have not seen this discussed, but I can see the utility.  One could
>> >> automatically check new schemas for compatibility with prior versions
>> >> before using them, to ensure that both old and new data can be read
>> >> with the new schema.  This would require checking that any added
>> >> fields have default values specified.
>> >>
>> >> Related is the ability to tell if an old schema can be used to read
>> >> data written with a newer.  This would require that any removed fields
>> >> have a default value specified.
>> >>
>> >> In general, to ensure readability in both cases, one should always
>> >> provide a default value for every field.  So a method that traversed a
>> >> schema and verified that each field has a default value might suffice.
>> >>
>> >> Doug
>> >>
>>

Re: Check for schema backwards compatibility

Posted by Jeff Kolesky <je...@kolesky.com>.
Schema.java is an unfortunately large file.  Would it be a reasonable
refactor (of course done as a separate unit of work) to pull out the nested
classes of Schema children (NamedSchema, RecordSchema, ArraySchema, etc)
into their own files as package protected classes?  It will make them more
accessible than they are now as private classes, but it would allow the
Schema.java file to be a more manageable size.

Jeff

On Mon, Nov 19, 2012 at 10:54 AM, Doug Cutting <cu...@apache.org> wrote:

> I don't feel strongly: a method on Schema would be fine with me as
> would an auxiliary tool class.  Schema.java is a huge file already,
> but I'm not sure that really causes any problems.
>
> On Sat, Nov 17, 2012 at 8:04 PM, Jeff Kolesky <je...@kolesky.com> wrote:
> > Would it be appropriate to add this method to the Schema class itself in
> > the same way `subsume` and `unify` were, or would you rather see a
> separate
> > tool, similar to SchemaNormalization?
> >
> > On Fri, Nov 16, 2012 at 3:54 PM, Doug Cutting <cu...@apache.org>
> wrote:
> >
> >> On Fri, Nov 16, 2012 at 3:37 PM, Jeff Kolesky <je...@gmail.com>
> >> wrote:
> >> > Has there been discussion of the need fot this type of tool?  Would
> other
> >> > people find it useful?
> >>
> >> I have not seen this discussed, but I can see the utility.  One could
> >> automatically check new schemas for compatibility with prior versions
> >> before using them, to ensure that both old and new data can be read
> >> with the new schema.  This would require checking that any added
> >> fields have default values specified.
> >>
> >> Related is the ability to tell if an old schema can be used to read
> >> data written with a newer.  This would require that any removed fields
> >> have a default value specified.
> >>
> >> In general, to ensure readability in both cases, one should always
> >> provide a default value for every field.  So a method that traversed a
> >> schema and verified that each field has a default value might suffice.
> >>
> >> Doug
> >>
>

Re: Check for schema backwards compatibility

Posted by Doug Cutting <cu...@apache.org>.
I don't feel strongly: a method on Schema would be fine with me as
would an auxiliary tool class.  Schema.java is a huge file already,
but I'm not sure that really causes any problems.

On Sat, Nov 17, 2012 at 8:04 PM, Jeff Kolesky <je...@kolesky.com> wrote:
> Would it be appropriate to add this method to the Schema class itself in
> the same way `subsume` and `unify` were, or would you rather see a separate
> tool, similar to SchemaNormalization?
>
> On Fri, Nov 16, 2012 at 3:54 PM, Doug Cutting <cu...@apache.org> wrote:
>
>> On Fri, Nov 16, 2012 at 3:37 PM, Jeff Kolesky <je...@gmail.com>
>> wrote:
>> > Has there been discussion of the need fot this type of tool?  Would other
>> > people find it useful?
>>
>> I have not seen this discussed, but I can see the utility.  One could
>> automatically check new schemas for compatibility with prior versions
>> before using them, to ensure that both old and new data can be read
>> with the new schema.  This would require checking that any added
>> fields have default values specified.
>>
>> Related is the ability to tell if an old schema can be used to read
>> data written with a newer.  This would require that any removed fields
>> have a default value specified.
>>
>> In general, to ensure readability in both cases, one should always
>> provide a default value for every field.  So a method that traversed a
>> schema and verified that each field has a default value might suffice.
>>
>> Doug
>>

Re: Check for schema backwards compatibility

Posted by Jeff Kolesky <je...@kolesky.com>.
Would it be appropriate to add this method to the Schema class itself in
the same way `subsume` and `unify` were, or would you rather see a separate
tool, similar to SchemaNormalization?

On Fri, Nov 16, 2012 at 3:54 PM, Doug Cutting <cu...@apache.org> wrote:

> On Fri, Nov 16, 2012 at 3:37 PM, Jeff Kolesky <je...@gmail.com>
> wrote:
> > Has there been discussion of the need fot this type of tool?  Would other
> > people find it useful?
>
> I have not seen this discussed, but I can see the utility.  One could
> automatically check new schemas for compatibility with prior versions
> before using them, to ensure that both old and new data can be read
> with the new schema.  This would require checking that any added
> fields have default values specified.
>
> Related is the ability to tell if an old schema can be used to read
> data written with a newer.  This would require that any removed fields
> have a default value specified.
>
> In general, to ensure readability in both cases, one should always
> provide a default value for every field.  So a method that traversed a
> schema and verified that each field has a default value might suffice.
>
> Doug
>

Re: Check for schema backwards compatibility

Posted by Doug Cutting <cu...@apache.org>.
On Fri, Nov 16, 2012 at 3:37 PM, Jeff Kolesky <je...@gmail.com> wrote:
> Has there been discussion of the need fot this type of tool?  Would other
> people find it useful?

I have not seen this discussed, but I can see the utility.  One could
automatically check new schemas for compatibility with prior versions
before using them, to ensure that both old and new data can be read
with the new schema.  This would require checking that any added
fields have default values specified.

Related is the ability to tell if an old schema can be used to read
data written with a newer.  This would require that any removed fields
have a default value specified.

In general, to ensure readability in both cases, one should always
provide a default value for every field.  So a method that traversed a
schema and verified that each field has a default value might suffice.

Doug