You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by KV 59 <kv...@gmail.com> on 2019/08/01 06:54:00 UTC

Re: union schema evolution

Hi Fokko,

Thanks for the response. I have been trying other things before I respond.

This is a not well documented and obscure  feature of the resolution. It
does work in 1.8.2 and also in 1.9.0 (there was an issue with my schema).
This is a very specific feature, which doesn't work with primitives

The specific case logically makes sense when there is a hierarchical types
in the branches, where in I can define a base type and I return that as
default.

For example for Writer schema I have aa addtional "string" branch in my
union

> {
>   "type" : "record",
>   "name" : "TestSample11",
>   "namespace" : "com.kvajjala.avro.test.samples",
>   "fields" : [ {
>     "name" : "id",
>     "type" : "long"
>   }, {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "unionField",
>     "type" : [ "null", "long", "string" ],
>     "default" : null
>   } ]
> }
>

The Reader schema is

> {
>   "type" : "record",
>   "name" : "TestSample11",
>   "namespace" : "com.kvajjala.avro.test.samples",
>   "fields" : [ {
>     "name" : "id",
>     "type" : "long"
>   }, {
>     "name" : "name",
>     "type" : "string"
>   }, {
>     "name" : "unionField",
>     "type" : [ "null", "long" ],
>     "default" : null
>   } ]
> }


 In my Writer if I set the string value then the reader throws an
exception(In both and 1.8.2 and 1.9.0 versions).

org.apache.avro.AvroTypeException: Found string, expecting union
> at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:308)
> at org.apache.avro.io.parsing.Parser.advance(Parser.java:86)
> at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:275)
> at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
> at
> org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
> at
> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237)
> at
> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
> at
> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:170)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
> at
> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)



Why can't the union resolution return a null if it cannot find an
appropriate or a branch of the same type. I think for a union which is
bound to evolve, a null as default (the first branch in the union) should
be an acceptable solution which will also be consistent with primitives.

Appreciate response

Thanks
Kishore


On Mon, Jul 22, 2019 at 11:13 PM Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Kishore,
>
> The easiest way to find out is compile it against Avro 1.8.2. This is a
> regression bug which will be fixed in 1.9.1 which will be released in the
> upcoming weeks.
>
> Cheers, Fokko
>
> Op ma 22 jul. 2019 om 22:38 schreef KV 59 <kv...@gmail.com>
>
>> Hi,
>>
>> I'm trying to define a union schema which is backward compatible.
>>
>> This is similar to the example in
>>
>>
>> http://apache-avro.679487.n3.nabble.com/Avro-union-compatibility-mode-enhancement-proposal-td4034377.html
>>
>> I have the original schema as below
>>
>>
>>   record BaseOrg {
>>>     long orgId;
>>>   }
>>>   record Org {
>>>     long orgId;
>>>     string name;
>>>   }
>>>   record Address {
>>>      string city;
>>>      string state;
>>>   }
>>>   record ExtendedOrg {
>>>     long orgId;
>>>     string name;
>>>     string industry;
>>>   }
>>
>>
>>     union {
>>>       null,
>>>       BaseOrg,
>>>       Org,
>>>       ExtendedOrg
>>>     }  org=null;
>>
>>
>> This is just some example code (I could have added some of the fields to
>> the original org )
>>
>> Now I have evolved  this  schema to
>>
>>>
>>>     union {
>>>       null,
>>>       BaseOrg,
>>>       Org,
>>>       ExtendedOrg
>>>       ,ExtendedOrg2
>>>     }  org=null;
>>>   record ExtendedOrg2 {
>>>     long orgId;
>>>     string name;
>>>     boolean active;
>>>     string geography;
>>>     string industry;
>>>   }
>>
>>
>> I have a consumer with the old schema and a producer with the new schema
>>
>> I saw in the JIRA
>>
>> https://issues.apache.org/jira/browse/AVRO-1590?focusedCommentId=14150780&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14150780
>>
>> considering this I would imagine when a producer sets the "ExtendedOrg2"
>> branch in the producer, The consumer who doesn't understand that should
>> match to either of BaseOr, Org, ExtendedOrg. But I get the below exception
>>
>> java.lang.ArrayIndexOutOfBoundsException: 4
>>> at
>>> org.apache.avro.io.parsing.Symbol$Alternative.getSymbol(Symbol.java:460)
>>> at
>>> org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:283)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:178)
>>> at
>>> org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:237)
>>> at
>>> org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:170)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
>>> at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:144)
>>> at com.five9.dataservices.avro.AgentReader.toEvent(AgentReader.java:101)
>>> at com.five9.dataservices.avro.AgentReader.read(AgentReader.java:83)
>>> at com.five9.dataservices.avro.AgentReader.main(AgentReader.java:59)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:282)
>>> at java.lang.Thread.run(Thread.java:748)
>>
>>
>> I would like to know if I understand the JIRA comments wrong or if
>> something's wrong with my schema
>>
>> Thanks
>> Kishore
>>
>>