You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@avro.apache.org by Raghvendra Singh <rs...@appdynamics.com> on 2016/02/01 21:31:25 UTC

Avro schema doesn't honor backward compatibilty

down votefavorite
<http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>

I have this avro schema

{
 "namespace": "xx.xxxx.xxxxx.xxxxx",
 "type": "record",
 "name": "MyPayLoad",
 "fields": [
     {"name": "filed1",  "type": "string"},
     {"name": "filed2",     "type": "long"},
     {"name": "filed3",  "type": "boolean"},
     {
          "name" : "metrics",
          "type":
          {
             "type" : "array",
             "items":
             {
                 "name": "MyRecord",
                 "type": "record",
                 "fields" :
                     [
                       {"name": "min", "type": "long"},
                       {"name": "max", "type": "long"},
                       {"name": "sum", "type": "long"},
                       {"name": "count", "type": "long"}
                     ]
             }
          }
     }
  ]}

Here is the code which we use to parse the data

public static final MyPayLoad parseBinaryPayload(byte[] payload) {
        DatumReader<MyPayLoad> payloadReader = new
SpecificDatumReader<>(MyPayLoad.class);
        Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
        MyPayLoad myPayLoad = null;
        try {
            myPayLoad = payloadReader.read(null, decoder);
        } catch (IOException e) {
            logger.log(Level.SEVERE, e.getMessage(), e);
        }

        return myPayLoad;
    }

Now i want to add one more field int the schema so the schema looks like
below

 {
 "namespace": "xx.xxxx.xxxxx.xxxxx",
 "type": "record",
 "name": "MyPayLoad",
 "fields": [
     {"name": "filed1",  "type": "string"},
     {"name": "filed2",     "type": "long"},
     {"name": "filed3",  "type": "boolean"},
     {
          "name" : "metrics",
          "type":
          {
             "type" : "array",
             "items":
             {
                 "name": "MyRecord",
                 "type": "record",
                 "fields" :
                     [
                       {"name": "min", "type": "long"},
                       {"name": "max", "type": "long"},
                       {"name": "sum", "type": "long"},
                       {"name": "count", "type": "long"}
                     ]
             }
          }
     }
     {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
  ]}

Note the filed added and also the default is defined. The problem is that
if we receive the data which was written using the older schema i get this
error

java.io.EOFException: null
    at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
~[avro-1.7.4.jar:1.7.4]
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
~[avro-1.7.4.jar:1.7.4]
    at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
~[blitz-shared.jar:na]

What i understood from this
<https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html>
document
that this should have been backward compatible but somehow that doesn't
seem to be the case. Any idea what i am doing wrong?

Re: Avro schema doesn't honor backward compatibilty

Posted by Ryan Blue <bl...@cloudera.com>.

Raghvendra,

Yes, you have to keep track of the schema that a blob of bytes was 
written with if you want to read those bytes back correctly. If that's 
inconvenient to keep track of, then I recommend keeping old versions 
around (1.avsc, 2.avsc, ...) and adding the schema fingerprint to the 
start your serialized payload. That's commonly done when people use Avro 
to serialize Kafka messages and we're considering standardizing the 
practice for interoperability.

rb

On 02/02/2016 11:44 AM, Raghvendra Singh wrote:
> Hi Ryan
>
> Thanks for your answer. Here is what i am doing in my environment
>
> 1. Write the data using the old schema
>
> *SpecificDatumWriter<ControllerPayload> datumWriter = new
> SpecificDatumWriter<>(SCHEMA_V1)*
>
> 2. Now trying to read the data written by the old schema using the new
> schema
>
> *DatumReader<ControllerPayload> payloadReader = new SpecificDatumReader<>(*
> *SCHEMA_V2**)*
>
> In this case *SCHEMA_V1 *is the old schema which doesn't have the field
> while SCHEMA_V2 is the new one which has the extra field.
>
> Your suggestion *"You should run setSchema on your SpecificDatumReader to
> set the schema the data was written with"*  is kind of work around where i
> have to read the data with the schema it was written with and hence this is
> not exactly backward compatible. Note that if i do this then i have to
> maintain all the schemas while reading and somehow know which version the
> data was written with and hence this will make schema evolution pretty
> painful.
>
> Please let me know if i didn't understand your email correctly or their is
> something i missed.
>
> -raghu
>
> On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <bl...@cloudera.com> wrote:
>
>> Hi Raghvendra,
>>
>> It looks like the problem is that you're using the new schema in place of
>> the schema that the data was written with.  You should run setSchema on
>> your SpecificDatumReader to set the schema the data was written with.
>>
>> What's happening is that the schema you're using, the new one, has the new
>> field so Avro assumes it is present and tries to read it. By setting the
>> schema that the data was actually written with, the datum reader will know
>> that it isn't present and will use your default instead. When you read data
>> encoded with the new schema, you need to use it as the written schema
>> instead so the datum reader knows that the field should be read.
>>
>> Does that make sense?
>>
>> rb
>>
>> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>>
>>> down votefavorite
>>> <
>>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>>>>
>>>
>>>
>>> I have this avro schema
>>>
>>> {
>>>    "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>    "type": "record",
>>>    "name": "MyPayLoad",
>>>    "fields": [
>>>        {"name": "filed1",  "type": "string"},
>>>        {"name": "filed2",     "type": "long"},
>>>        {"name": "filed3",  "type": "boolean"},
>>>        {
>>>             "name" : "metrics",
>>>             "type":
>>>             {
>>>                "type" : "array",
>>>                "items":
>>>                {
>>>                    "name": "MyRecord",
>>>                    "type": "record",
>>>                    "fields" :
>>>                        [
>>>                          {"name": "min", "type": "long"},
>>>                          {"name": "max", "type": "long"},
>>>                          {"name": "sum", "type": "long"},
>>>                          {"name": "count", "type": "long"}
>>>                        ]
>>>                }
>>>             }
>>>        }
>>>     ]}
>>>
>>> Here is the code which we use to parse the data
>>>
>>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>>           DatumReader<MyPayLoad> payloadReader = new
>>> SpecificDatumReader<>(MyPayLoad.class);
>>>           Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>>> null);
>>>           MyPayLoad myPayLoad = null;
>>>           try {
>>>               myPayLoad = payloadReader.read(null, decoder);
>>>           } catch (IOException e) {
>>>               logger.log(Level.SEVERE, e.getMessage(), e);
>>>           }
>>>
>>>           return myPayLoad;
>>>       }
>>>
>>> Now i want to add one more field int the schema so the schema looks like
>>> below
>>>
>>>    {
>>>    "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>    "type": "record",
>>>    "name": "MyPayLoad",
>>>    "fields": [
>>>        {"name": "filed1",  "type": "string"},
>>>        {"name": "filed2",     "type": "long"},
>>>        {"name": "filed3",  "type": "boolean"},
>>>        {
>>>             "name" : "metrics",
>>>             "type":
>>>             {
>>>                "type" : "array",
>>>                "items":
>>>                {
>>>                    "name": "MyRecord",
>>>                    "type": "record",
>>>                    "fields" :
>>>                        [
>>>                          {"name": "min", "type": "long"},
>>>                          {"name": "max", "type": "long"},
>>>                          {"name": "sum", "type": "long"},
>>>                          {"name": "count", "type": "long"}
>>>                        ]
>>>                }
>>>             }
>>>        }
>>>        {"name": "agentType",  "type": ["null", "string"], "default":
>>> "APP_AGENT"}
>>>     ]}
>>>
>>> Note the filed added and also the default is defined. The problem is that
>>> if we receive the data which was written using the older schema i get this
>>> error
>>>
>>> java.io.EOFException: null
>>>       at org.apache.avro.io
>>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at org.apache.avro.io
>>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at org.apache.avro.io
>>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at org.apache.avro.io
>>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>       at
>>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>>> ~[blitz-shared.jar:na]
>>>
>>> What i understood from this
>>> <
>>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>>>>
>>> document
>>> that this should have been backward compatible but somehow that doesn't
>>> seem to be the case. Any idea what i am doing wrong?
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>>
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Avro schema doesn't honor backward compatibilty

Posted by Raghvendra Singh <rs...@appdynamics.com>.

Great, Thank you very much guys, this works. Very much appreciated.

On Tue, Feb 2, 2016 at 12:46 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> Raghvendra,
>
> You need to use
>
> *DatumReader<ControllerPayload> payloadReader = new
> SpecificDatumReader<>(SCHEMA_V1, **SCHEMA_V2**)*
>
> So you provide both writer(SCHEMA_V1) and reader(SCHMEA_V2) to avro. In
> your current case avro is assuming both to be the same which is certainly
> not the case and hence it is failing. I think this is what Ryan was
> referring to as well.
>
> Hope that helps.
>
>
>
> On Tue, Feb 2, 2016 at 1:44 PM, Raghvendra Singh <rs...@appdynamics.com>
> wrote:
>
>> Hi Ryan
>>
>> Thanks for your answer. Here is what i am doing in my environment
>>
>> 1. Write the data using the old schema
>>
>> *SpecificDatumWriter<ControllerPayload> datumWriter = new
>> SpecificDatumWriter<>(SCHEMA_V1)*
>>
>> 2. Now trying to read the data written by the old schema using the new
>> schema
>>
>> *DatumReader<ControllerPayload> payloadReader = new
>> SpecificDatumReader<>(**SCHEMA_V2**)*
>>
>> In this case *SCHEMA_V1 *is the old schema which doesn't have the field
>> while SCHEMA_V2 is the new one which has the extra field.
>>
>> Your suggestion *"You should run setSchema on your SpecificDatumReader
>> to set the schema the data was written with"*  is kind of work around
>> where i have to read the data with the schema it was written with and hence
>> this is not exactly backward compatible. Note that if i do this then i have
>> to maintain all the schemas while reading and somehow know which version
>> the data was written with and hence this will make schema evolution pretty
>> painful.
>>
>> Please let me know if i didn't understand your email correctly or their
>> is something i missed.
>>
>> -raghu
>>
>> On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <bl...@cloudera.com> wrote:
>>
>>> Hi Raghvendra,
>>>
>>> It looks like the problem is that you're using the new schema in place
>>> of the schema that the data was written with.  You should run setSchema on
>>> your SpecificDatumReader to set the schema the data was written with.
>>>
>>> What's happening is that the schema you're using, the new one, has the
>>> new field so Avro assumes it is present and tries to read it. By setting
>>> the schema that the data was actually written with, the datum reader will
>>> know that it isn't present and will use your default instead. When you read
>>> data encoded with the new schema, you need to use it as the written schema
>>> instead so the datum reader knows that the field should be read.
>>>
>>> Does that make sense?
>>>
>>> rb
>>>
>>> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>>>
>>>> down votefavorite
>>>> <
>>>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>>>> >
>>>>
>>>>
>>>> I have this avro schema
>>>>
>>>> {
>>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>>   "type": "record",
>>>>   "name": "MyPayLoad",
>>>>   "fields": [
>>>>       {"name": "filed1",  "type": "string"},
>>>>       {"name": "filed2",     "type": "long"},
>>>>       {"name": "filed3",  "type": "boolean"},
>>>>       {
>>>>            "name" : "metrics",
>>>>            "type":
>>>>            {
>>>>               "type" : "array",
>>>>               "items":
>>>>               {
>>>>                   "name": "MyRecord",
>>>>                   "type": "record",
>>>>                   "fields" :
>>>>                       [
>>>>                         {"name": "min", "type": "long"},
>>>>                         {"name": "max", "type": "long"},
>>>>                         {"name": "sum", "type": "long"},
>>>>                         {"name": "count", "type": "long"}
>>>>                       ]
>>>>               }
>>>>            }
>>>>       }
>>>>    ]}
>>>>
>>>> Here is the code which we use to parse the data
>>>>
>>>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>>>          DatumReader<MyPayLoad> payloadReader = new
>>>> SpecificDatumReader<>(MyPayLoad.class);
>>>>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>>>> null);
>>>>          MyPayLoad myPayLoad = null;
>>>>          try {
>>>>              myPayLoad = payloadReader.read(null, decoder);
>>>>          } catch (IOException e) {
>>>>              logger.log(Level.SEVERE, e.getMessage(), e);
>>>>          }
>>>>
>>>>          return myPayLoad;
>>>>      }
>>>>
>>>> Now i want to add one more field int the schema so the schema looks like
>>>> below
>>>>
>>>>   {
>>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>>   "type": "record",
>>>>   "name": "MyPayLoad",
>>>>   "fields": [
>>>>       {"name": "filed1",  "type": "string"},
>>>>       {"name": "filed2",     "type": "long"},
>>>>       {"name": "filed3",  "type": "boolean"},
>>>>       {
>>>>            "name" : "metrics",
>>>>            "type":
>>>>            {
>>>>               "type" : "array",
>>>>               "items":
>>>>               {
>>>>                   "name": "MyRecord",
>>>>                   "type": "record",
>>>>                   "fields" :
>>>>                       [
>>>>                         {"name": "min", "type": "long"},
>>>>                         {"name": "max", "type": "long"},
>>>>                         {"name": "sum", "type": "long"},
>>>>                         {"name": "count", "type": "long"}
>>>>                       ]
>>>>               }
>>>>            }
>>>>       }
>>>>       {"name": "agentType",  "type": ["null", "string"], "default":
>>>> "APP_AGENT"}
>>>>    ]}
>>>>
>>>> Note the filed added and also the default is defined. The problem is
>>>> that
>>>> if we receive the data which was written using the older schema i get
>>>> this
>>>> error
>>>>
>>>> java.io.EOFException: null
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.readInt(BinaryDecoder.java:128)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>>>> ~[blitz-shared.jar:na]
>>>>
>>>> What i understood from this
>>>> <
>>>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>>>> >
>>>> document
>>>> that this should have been backward compatible but somehow that doesn't
>>>> seem to be the case. Any idea what i am doing wrong?
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Cloudera, Inc.
>>>
>>
>>
>
>
> --
> Swarnim
>

Re: Avro schema doesn't honor backward compatibilty

Posted by Raghvendra Singh <rs...@appdynamics.com>.

Great, Thank you very much guys, this works. Very much appreciated.

On Tue, Feb 2, 2016 at 12:46 PM, kulkarni.swarnim@gmail.com <
kulkarni.swarnim@gmail.com> wrote:

> Raghvendra,
>
> You need to use
>
> *DatumReader<ControllerPayload> payloadReader = new
> SpecificDatumReader<>(SCHEMA_V1, **SCHEMA_V2**)*
>
> So you provide both writer(SCHEMA_V1) and reader(SCHMEA_V2) to avro. In
> your current case avro is assuming both to be the same which is certainly
> not the case and hence it is failing. I think this is what Ryan was
> referring to as well.
>
> Hope that helps.
>
>
>
> On Tue, Feb 2, 2016 at 1:44 PM, Raghvendra Singh <rs...@appdynamics.com>
> wrote:
>
>> Hi Ryan
>>
>> Thanks for your answer. Here is what i am doing in my environment
>>
>> 1. Write the data using the old schema
>>
>> *SpecificDatumWriter<ControllerPayload> datumWriter = new
>> SpecificDatumWriter<>(SCHEMA_V1)*
>>
>> 2. Now trying to read the data written by the old schema using the new
>> schema
>>
>> *DatumReader<ControllerPayload> payloadReader = new
>> SpecificDatumReader<>(**SCHEMA_V2**)*
>>
>> In this case *SCHEMA_V1 *is the old schema which doesn't have the field
>> while SCHEMA_V2 is the new one which has the extra field.
>>
>> Your suggestion *"You should run setSchema on your SpecificDatumReader
>> to set the schema the data was written with"*  is kind of work around
>> where i have to read the data with the schema it was written with and hence
>> this is not exactly backward compatible. Note that if i do this then i have
>> to maintain all the schemas while reading and somehow know which version
>> the data was written with and hence this will make schema evolution pretty
>> painful.
>>
>> Please let me know if i didn't understand your email correctly or their
>> is something i missed.
>>
>> -raghu
>>
>> On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <bl...@cloudera.com> wrote:
>>
>>> Hi Raghvendra,
>>>
>>> It looks like the problem is that you're using the new schema in place
>>> of the schema that the data was written with.  You should run setSchema on
>>> your SpecificDatumReader to set the schema the data was written with.
>>>
>>> What's happening is that the schema you're using, the new one, has the
>>> new field so Avro assumes it is present and tries to read it. By setting
>>> the schema that the data was actually written with, the datum reader will
>>> know that it isn't present and will use your default instead. When you read
>>> data encoded with the new schema, you need to use it as the written schema
>>> instead so the datum reader knows that the field should be read.
>>>
>>> Does that make sense?
>>>
>>> rb
>>>
>>> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>>>
>>>> down votefavorite
>>>> <
>>>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>>>> >
>>>>
>>>>
>>>> I have this avro schema
>>>>
>>>> {
>>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>>   "type": "record",
>>>>   "name": "MyPayLoad",
>>>>   "fields": [
>>>>       {"name": "filed1",  "type": "string"},
>>>>       {"name": "filed2",     "type": "long"},
>>>>       {"name": "filed3",  "type": "boolean"},
>>>>       {
>>>>            "name" : "metrics",
>>>>            "type":
>>>>            {
>>>>               "type" : "array",
>>>>               "items":
>>>>               {
>>>>                   "name": "MyRecord",
>>>>                   "type": "record",
>>>>                   "fields" :
>>>>                       [
>>>>                         {"name": "min", "type": "long"},
>>>>                         {"name": "max", "type": "long"},
>>>>                         {"name": "sum", "type": "long"},
>>>>                         {"name": "count", "type": "long"}
>>>>                       ]
>>>>               }
>>>>            }
>>>>       }
>>>>    ]}
>>>>
>>>> Here is the code which we use to parse the data
>>>>
>>>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>>>          DatumReader<MyPayLoad> payloadReader = new
>>>> SpecificDatumReader<>(MyPayLoad.class);
>>>>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>>>> null);
>>>>          MyPayLoad myPayLoad = null;
>>>>          try {
>>>>              myPayLoad = payloadReader.read(null, decoder);
>>>>          } catch (IOException e) {
>>>>              logger.log(Level.SEVERE, e.getMessage(), e);
>>>>          }
>>>>
>>>>          return myPayLoad;
>>>>      }
>>>>
>>>> Now i want to add one more field int the schema so the schema looks like
>>>> below
>>>>
>>>>   {
>>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>>   "type": "record",
>>>>   "name": "MyPayLoad",
>>>>   "fields": [
>>>>       {"name": "filed1",  "type": "string"},
>>>>       {"name": "filed2",     "type": "long"},
>>>>       {"name": "filed3",  "type": "boolean"},
>>>>       {
>>>>            "name" : "metrics",
>>>>            "type":
>>>>            {
>>>>               "type" : "array",
>>>>               "items":
>>>>               {
>>>>                   "name": "MyRecord",
>>>>                   "type": "record",
>>>>                   "fields" :
>>>>                       [
>>>>                         {"name": "min", "type": "long"},
>>>>                         {"name": "max", "type": "long"},
>>>>                         {"name": "sum", "type": "long"},
>>>>                         {"name": "count", "type": "long"}
>>>>                       ]
>>>>               }
>>>>            }
>>>>       }
>>>>       {"name": "agentType",  "type": ["null", "string"], "default":
>>>> "APP_AGENT"}
>>>>    ]}
>>>>
>>>> Note the filed added and also the default is defined. The problem is
>>>> that
>>>> if we receive the data which was written using the older schema i get
>>>> this
>>>> error
>>>>
>>>> java.io.EOFException: null
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.readInt(BinaryDecoder.java:128)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at org.apache.avro.io
>>>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>>> ~[avro-1.7.4.jar:1.7.4]
>>>>      at
>>>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>>>> ~[blitz-shared.jar:na]
>>>>
>>>> What i understood from this
>>>> <
>>>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>>>> >
>>>> document
>>>> that this should have been backward compatible but somehow that doesn't
>>>> seem to be the case. Any idea what i am doing wrong?
>>>>
>>>>
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Cloudera, Inc.
>>>
>>
>>
>
>
> --
> Swarnim
>

Re: Avro schema doesn't honor backward compatibilty

Posted by "kulkarni.swarnim@gmail.com" <ku...@gmail.com>.

Raghvendra,

You need to use

*DatumReader<ControllerPayload> payloadReader = new
SpecificDatumReader<>(SCHEMA_V1, **SCHEMA_V2**)*

So you provide both writer(SCHEMA_V1) and reader(SCHMEA_V2) to avro. In
your current case avro is assuming both to be the same which is certainly
not the case and hence it is failing. I think this is what Ryan was
referring to as well.

Hope that helps.



On Tue, Feb 2, 2016 at 1:44 PM, Raghvendra Singh <rs...@appdynamics.com>
wrote:

> Hi Ryan
>
> Thanks for your answer. Here is what i am doing in my environment
>
> 1. Write the data using the old schema
>
> *SpecificDatumWriter<ControllerPayload> datumWriter = new
> SpecificDatumWriter<>(SCHEMA_V1)*
>
> 2. Now trying to read the data written by the old schema using the new
> schema
>
> *DatumReader<ControllerPayload> payloadReader = new SpecificDatumReader<>(*
> *SCHEMA_V2**)*
>
> In this case *SCHEMA_V1 *is the old schema which doesn't have the field
> while SCHEMA_V2 is the new one which has the extra field.
>
> Your suggestion *"You should run setSchema on your SpecificDatumReader to
> set the schema the data was written with"*  is kind of work around where
> i have to read the data with the schema it was written with and hence this
> is not exactly backward compatible. Note that if i do this then i have to
> maintain all the schemas while reading and somehow know which version the
> data was written with and hence this will make schema evolution pretty
> painful.
>
> Please let me know if i didn't understand your email correctly or their is
> something i missed.
>
> -raghu
>
> On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <bl...@cloudera.com> wrote:
>
>> Hi Raghvendra,
>>
>> It looks like the problem is that you're using the new schema in place of
>> the schema that the data was written with.  You should run setSchema on
>> your SpecificDatumReader to set the schema the data was written with.
>>
>> What's happening is that the schema you're using, the new one, has the
>> new field so Avro assumes it is present and tries to read it. By setting
>> the schema that the data was actually written with, the datum reader will
>> know that it isn't present and will use your default instead. When you read
>> data encoded with the new schema, you need to use it as the written schema
>> instead so the datum reader knows that the field should be read.
>>
>> Does that make sense?
>>
>> rb
>>
>> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>>
>>> down votefavorite
>>> <
>>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>>> >
>>>
>>>
>>> I have this avro schema
>>>
>>> {
>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>   "type": "record",
>>>   "name": "MyPayLoad",
>>>   "fields": [
>>>       {"name": "filed1",  "type": "string"},
>>>       {"name": "filed2",     "type": "long"},
>>>       {"name": "filed3",  "type": "boolean"},
>>>       {
>>>            "name" : "metrics",
>>>            "type":
>>>            {
>>>               "type" : "array",
>>>               "items":
>>>               {
>>>                   "name": "MyRecord",
>>>                   "type": "record",
>>>                   "fields" :
>>>                       [
>>>                         {"name": "min", "type": "long"},
>>>                         {"name": "max", "type": "long"},
>>>                         {"name": "sum", "type": "long"},
>>>                         {"name": "count", "type": "long"}
>>>                       ]
>>>               }
>>>            }
>>>       }
>>>    ]}
>>>
>>> Here is the code which we use to parse the data
>>>
>>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>>          DatumReader<MyPayLoad> payloadReader = new
>>> SpecificDatumReader<>(MyPayLoad.class);
>>>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>>> null);
>>>          MyPayLoad myPayLoad = null;
>>>          try {
>>>              myPayLoad = payloadReader.read(null, decoder);
>>>          } catch (IOException e) {
>>>              logger.log(Level.SEVERE, e.getMessage(), e);
>>>          }
>>>
>>>          return myPayLoad;
>>>      }
>>>
>>> Now i want to add one more field int the schema so the schema looks like
>>> below
>>>
>>>   {
>>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>   "type": "record",
>>>   "name": "MyPayLoad",
>>>   "fields": [
>>>       {"name": "filed1",  "type": "string"},
>>>       {"name": "filed2",     "type": "long"},
>>>       {"name": "filed3",  "type": "boolean"},
>>>       {
>>>            "name" : "metrics",
>>>            "type":
>>>            {
>>>               "type" : "array",
>>>               "items":
>>>               {
>>>                   "name": "MyRecord",
>>>                   "type": "record",
>>>                   "fields" :
>>>                       [
>>>                         {"name": "min", "type": "long"},
>>>                         {"name": "max", "type": "long"},
>>>                         {"name": "sum", "type": "long"},
>>>                         {"name": "count", "type": "long"}
>>>                       ]
>>>               }
>>>            }
>>>       }
>>>       {"name": "agentType",  "type": ["null", "string"], "default":
>>> "APP_AGENT"}
>>>    ]}
>>>
>>> Note the filed added and also the default is defined. The problem is that
>>> if we receive the data which was written using the older schema i get
>>> this
>>> error
>>>
>>> java.io.EOFException: null
>>>      at org.apache.avro.io
>>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io
>>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io
>>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at org.apache.avro.io
>>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>>> ~[avro-1.7.4.jar:1.7.4]
>>>      at
>>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>>> ~[blitz-shared.jar:na]
>>>
>>> What i understood from this
>>> <
>>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>>> >
>>> document
>>> that this should have been backward compatible but somehow that doesn't
>>> seem to be the case. Any idea what i am doing wrong?
>>>
>>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Cloudera, Inc.
>>
>
>


-- 
Swarnim

Re: Avro schema doesn't honor backward compatibilty

Posted by Raghvendra Singh <rs...@appdynamics.com>.

Hi Ryan

Thanks for your answer. Here is what i am doing in my environment

1. Write the data using the old schema

*SpecificDatumWriter<ControllerPayload> datumWriter = new
SpecificDatumWriter<>(SCHEMA_V1)*

2. Now trying to read the data written by the old schema using the new
schema

*DatumReader<ControllerPayload> payloadReader = new SpecificDatumReader<>(*
*SCHEMA_V2**)*

In this case *SCHEMA_V1 *is the old schema which doesn't have the field
while SCHEMA_V2 is the new one which has the extra field.

Your suggestion *"You should run setSchema on your SpecificDatumReader to
set the schema the data was written with"*  is kind of work around where i
have to read the data with the schema it was written with and hence this is
not exactly backward compatible. Note that if i do this then i have to
maintain all the schemas while reading and somehow know which version the
data was written with and hence this will make schema evolution pretty
painful.

Please let me know if i didn't understand your email correctly or their is
something i missed.

-raghu

On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <bl...@cloudera.com> wrote:

> Hi Raghvendra,
>
> It looks like the problem is that you're using the new schema in place of
> the schema that the data was written with.  You should run setSchema on
> your SpecificDatumReader to set the schema the data was written with.
>
> What's happening is that the schema you're using, the new one, has the new
> field so Avro assumes it is present and tries to read it. By setting the
> schema that the data was actually written with, the datum reader will know
> that it isn't present and will use your default instead. When you read data
> encoded with the new schema, you need to use it as the written schema
> instead so the datum reader knows that the field should be read.
>
> Does that make sense?
>
> rb
>
> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>
>> down votefavorite
>> <
>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>> >
>>
>>
>> I have this avro schema
>>
>> {
>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>   "type": "record",
>>   "name": "MyPayLoad",
>>   "fields": [
>>       {"name": "filed1",  "type": "string"},
>>       {"name": "filed2",     "type": "long"},
>>       {"name": "filed3",  "type": "boolean"},
>>       {
>>            "name" : "metrics",
>>            "type":
>>            {
>>               "type" : "array",
>>               "items":
>>               {
>>                   "name": "MyRecord",
>>                   "type": "record",
>>                   "fields" :
>>                       [
>>                         {"name": "min", "type": "long"},
>>                         {"name": "max", "type": "long"},
>>                         {"name": "sum", "type": "long"},
>>                         {"name": "count", "type": "long"}
>>                       ]
>>               }
>>            }
>>       }
>>    ]}
>>
>> Here is the code which we use to parse the data
>>
>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>          DatumReader<MyPayLoad> payloadReader = new
>> SpecificDatumReader<>(MyPayLoad.class);
>>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>> null);
>>          MyPayLoad myPayLoad = null;
>>          try {
>>              myPayLoad = payloadReader.read(null, decoder);
>>          } catch (IOException e) {
>>              logger.log(Level.SEVERE, e.getMessage(), e);
>>          }
>>
>>          return myPayLoad;
>>      }
>>
>> Now i want to add one more field int the schema so the schema looks like
>> below
>>
>>   {
>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>   "type": "record",
>>   "name": "MyPayLoad",
>>   "fields": [
>>       {"name": "filed1",  "type": "string"},
>>       {"name": "filed2",     "type": "long"},
>>       {"name": "filed3",  "type": "boolean"},
>>       {
>>            "name" : "metrics",
>>            "type":
>>            {
>>               "type" : "array",
>>               "items":
>>               {
>>                   "name": "MyRecord",
>>                   "type": "record",
>>                   "fields" :
>>                       [
>>                         {"name": "min", "type": "long"},
>>                         {"name": "max", "type": "long"},
>>                         {"name": "sum", "type": "long"},
>>                         {"name": "count", "type": "long"}
>>                       ]
>>               }
>>            }
>>       }
>>       {"name": "agentType",  "type": ["null", "string"], "default":
>> "APP_AGENT"}
>>    ]}
>>
>> Note the filed added and also the default is defined. The problem is that
>> if we receive the data which was written using the older schema i get this
>> error
>>
>> java.io.EOFException: null
>>      at org.apache.avro.io
>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io
>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io
>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io
>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>> ~[blitz-shared.jar:na]
>>
>> What i understood from this
>> <
>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>> >
>> document
>> that this should have been backward compatible but somehow that doesn't
>> seem to be the case. Any idea what i am doing wrong?
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: Avro schema doesn't honor backward compatibilty

Posted by Raghvendra Singh <rs...@appdynamics.com>.

Hi Ryan

Thanks for your answer. Here is what i am doing in my environment

1. Write the data using the old schema

*SpecificDatumWriter<ControllerPayload> datumWriter = new
SpecificDatumWriter<>(SCHEMA_V1)*

2. Now trying to read the data written by the old schema using the new
schema

*DatumReader<ControllerPayload> payloadReader = new SpecificDatumReader<>(*
*SCHEMA_V2**)*

In this case *SCHEMA_V1 *is the old schema which doesn't have the field
while SCHEMA_V2 is the new one which has the extra field.

Your suggestion *"You should run setSchema on your SpecificDatumReader to
set the schema the data was written with"*  is kind of work around where i
have to read the data with the schema it was written with and hence this is
not exactly backward compatible. Note that if i do this then i have to
maintain all the schemas while reading and somehow know which version the
data was written with and hence this will make schema evolution pretty
painful.

Please let me know if i didn't understand your email correctly or their is
something i missed.

-raghu

On Tue, Feb 2, 2016 at 9:19 AM, Ryan Blue <bl...@cloudera.com> wrote:

> Hi Raghvendra,
>
> It looks like the problem is that you're using the new schema in place of
> the schema that the data was written with.  You should run setSchema on
> your SpecificDatumReader to set the schema the data was written with.
>
> What's happening is that the schema you're using, the new one, has the new
> field so Avro assumes it is present and tries to read it. By setting the
> schema that the data was actually written with, the datum reader will know
> that it isn't present and will use your default instead. When you read data
> encoded with the new schema, you need to use it as the written schema
> instead so the datum reader knows that the field should be read.
>
> Does that make sense?
>
> rb
>
> On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
>
>> down votefavorite
>> <
>> http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#
>> >
>>
>>
>> I have this avro schema
>>
>> {
>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>   "type": "record",
>>   "name": "MyPayLoad",
>>   "fields": [
>>       {"name": "filed1",  "type": "string"},
>>       {"name": "filed2",     "type": "long"},
>>       {"name": "filed3",  "type": "boolean"},
>>       {
>>            "name" : "metrics",
>>            "type":
>>            {
>>               "type" : "array",
>>               "items":
>>               {
>>                   "name": "MyRecord",
>>                   "type": "record",
>>                   "fields" :
>>                       [
>>                         {"name": "min", "type": "long"},
>>                         {"name": "max", "type": "long"},
>>                         {"name": "sum", "type": "long"},
>>                         {"name": "count", "type": "long"}
>>                       ]
>>               }
>>            }
>>       }
>>    ]}
>>
>> Here is the code which we use to parse the data
>>
>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>          DatumReader<MyPayLoad> payloadReader = new
>> SpecificDatumReader<>(MyPayLoad.class);
>>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload,
>> null);
>>          MyPayLoad myPayLoad = null;
>>          try {
>>              myPayLoad = payloadReader.read(null, decoder);
>>          } catch (IOException e) {
>>              logger.log(Level.SEVERE, e.getMessage(), e);
>>          }
>>
>>          return myPayLoad;
>>      }
>>
>> Now i want to add one more field int the schema so the schema looks like
>> below
>>
>>   {
>>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>>   "type": "record",
>>   "name": "MyPayLoad",
>>   "fields": [
>>       {"name": "filed1",  "type": "string"},
>>       {"name": "filed2",     "type": "long"},
>>       {"name": "filed3",  "type": "boolean"},
>>       {
>>            "name" : "metrics",
>>            "type":
>>            {
>>               "type" : "array",
>>               "items":
>>               {
>>                   "name": "MyRecord",
>>                   "type": "record",
>>                   "fields" :
>>                       [
>>                         {"name": "min", "type": "long"},
>>                         {"name": "max", "type": "long"},
>>                         {"name": "sum", "type": "long"},
>>                         {"name": "count", "type": "long"}
>>                       ]
>>               }
>>            }
>>       }
>>       {"name": "agentType",  "type": ["null", "string"], "default":
>> "APP_AGENT"}
>>    ]}
>>
>> Note the filed added and also the default is defined. The problem is that
>> if we receive the data which was written using the older schema i get this
>> error
>>
>> java.io.EOFException: null
>>      at org.apache.avro.io
>> .BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io
>> .BinaryDecoder.readIndex(BinaryDecoder.java:423)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io
>> .ResolvingDecoder.doAction(ResolvingDecoder.java:229)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at org.apache.avro.io
>> .ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
>> ~[avro-1.7.4.jar:1.7.4]
>>      at
>> com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
>> ~[blitz-shared.jar:na]
>>
>> What i understood from this
>> <
>> https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html
>> >
>> document
>> that this should have been backward compatible but somehow that doesn't
>> seem to be the case. Any idea what i am doing wrong?
>>
>>
>
> --
> Ryan Blue
> Software Engineer
> Cloudera, Inc.
>

Re: Avro schema doesn't honor backward compatibilty

Posted by Ryan Blue <bl...@cloudera.com>.

Hi Raghvendra,

It looks like the problem is that you're using the new schema in place 
of the schema that the data was written with.  You should run setSchema 
on your SpecificDatumReader to set the schema the data was written with.

What's happening is that the schema you're using, the new one, has the 
new field so Avro assumes it is present and tries to read it. By setting 
the schema that the data was actually written with, the datum reader 
will know that it isn't present and will use your default instead. When 
you read data encoded with the new schema, you need to use it as the 
written schema instead so the datum reader knows that the field should 
be read.

Does that make sense?

rb

On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
> down votefavorite
> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>
> I have this avro schema
>
> {
>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>   "type": "record",
>   "name": "MyPayLoad",
>   "fields": [
>       {"name": "filed1",  "type": "string"},
>       {"name": "filed2",     "type": "long"},
>       {"name": "filed3",  "type": "boolean"},
>       {
>            "name" : "metrics",
>            "type":
>            {
>               "type" : "array",
>               "items":
>               {
>                   "name": "MyRecord",
>                   "type": "record",
>                   "fields" :
>                       [
>                         {"name": "min", "type": "long"},
>                         {"name": "max", "type": "long"},
>                         {"name": "sum", "type": "long"},
>                         {"name": "count", "type": "long"}
>                       ]
>               }
>            }
>       }
>    ]}
>
> Here is the code which we use to parse the data
>
> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>          DatumReader<MyPayLoad> payloadReader = new
> SpecificDatumReader<>(MyPayLoad.class);
>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
>          MyPayLoad myPayLoad = null;
>          try {
>              myPayLoad = payloadReader.read(null, decoder);
>          } catch (IOException e) {
>              logger.log(Level.SEVERE, e.getMessage(), e);
>          }
>
>          return myPayLoad;
>      }
>
> Now i want to add one more field int the schema so the schema looks like
> below
>
>   {
>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>   "type": "record",
>   "name": "MyPayLoad",
>   "fields": [
>       {"name": "filed1",  "type": "string"},
>       {"name": "filed2",     "type": "long"},
>       {"name": "filed3",  "type": "boolean"},
>       {
>            "name" : "metrics",
>            "type":
>            {
>               "type" : "array",
>               "items":
>               {
>                   "name": "MyRecord",
>                   "type": "record",
>                   "fields" :
>                       [
>                         {"name": "min", "type": "long"},
>                         {"name": "max", "type": "long"},
>                         {"name": "sum", "type": "long"},
>                         {"name": "count", "type": "long"}
>                       ]
>               }
>            }
>       }
>       {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
>    ]}
>
> Note the filed added and also the default is defined. The problem is that
> if we receive the data which was written using the older schema i get this
> error
>
> java.io.EOFException: null
>      at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
> ~[avro-1.7.4.jar:1.7.4]
>      at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
> ~[blitz-shared.jar:na]
>
> What i understood from this
> <https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html>
> document
> that this should have been backward compatible but somehow that doesn't
> seem to be the case. Any idea what i am doing wrong?
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Avro schema doesn't honor backward compatibilty

Posted by Ryan Blue <bl...@cloudera.com>.

Hi Raghvendra,

It looks like the problem is that you're using the new schema in place 
of the schema that the data was written with.  You should run setSchema 
on your SpecificDatumReader to set the schema the data was written with.

What's happening is that the schema you're using, the new one, has the 
new field so Avro assumes it is present and tries to read it. By setting 
the schema that the data was actually written with, the datum reader 
will know that it isn't present and will use your default instead. When 
you read data encoded with the new schema, you need to use it as the 
written schema instead so the datum reader knows that the field should 
be read.

Does that make sense?

rb

On 02/01/2016 12:31 PM, Raghvendra Singh wrote:
> down votefavorite
> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>
> I have this avro schema
>
> {
>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>   "type": "record",
>   "name": "MyPayLoad",
>   "fields": [
>       {"name": "filed1",  "type": "string"},
>       {"name": "filed2",     "type": "long"},
>       {"name": "filed3",  "type": "boolean"},
>       {
>            "name" : "metrics",
>            "type":
>            {
>               "type" : "array",
>               "items":
>               {
>                   "name": "MyRecord",
>                   "type": "record",
>                   "fields" :
>                       [
>                         {"name": "min", "type": "long"},
>                         {"name": "max", "type": "long"},
>                         {"name": "sum", "type": "long"},
>                         {"name": "count", "type": "long"}
>                       ]
>               }
>            }
>       }
>    ]}
>
> Here is the code which we use to parse the data
>
> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>          DatumReader<MyPayLoad> payloadReader = new
> SpecificDatumReader<>(MyPayLoad.class);
>          Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
>          MyPayLoad myPayLoad = null;
>          try {
>              myPayLoad = payloadReader.read(null, decoder);
>          } catch (IOException e) {
>              logger.log(Level.SEVERE, e.getMessage(), e);
>          }
>
>          return myPayLoad;
>      }
>
> Now i want to add one more field int the schema so the schema looks like
> below
>
>   {
>   "namespace": "xx.xxxx.xxxxx.xxxxx",
>   "type": "record",
>   "name": "MyPayLoad",
>   "fields": [
>       {"name": "filed1",  "type": "string"},
>       {"name": "filed2",     "type": "long"},
>       {"name": "filed3",  "type": "boolean"},
>       {
>            "name" : "metrics",
>            "type":
>            {
>               "type" : "array",
>               "items":
>               {
>                   "name": "MyRecord",
>                   "type": "record",
>                   "fields" :
>                       [
>                         {"name": "min", "type": "long"},
>                         {"name": "max", "type": "long"},
>                         {"name": "sum", "type": "long"},
>                         {"name": "count", "type": "long"}
>                       ]
>               }
>            }
>       }
>       {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
>    ]}
>
> Note the filed added and also the default is defined. The problem is that
> if we receive the data which was written using the older schema i get this
> error
>
> java.io.EOFException: null
>      at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148)
> ~[avro-1.7.4.jar:1.7.4]
>      at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139)
> ~[avro-1.7.4.jar:1.7.4]
>      at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38)
> ~[blitz-shared.jar:na]
>
> What i understood from this
> <https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html>
> document
> that this should have been backward compatible but somehow that doesn't
> seem to be the case. Any idea what i am doing wrong?
>


-- 
Ryan Blue
Software Engineer
Cloudera, Inc.

Re: Avro schema doesn't honor backward compatibilty

Posted by Raghvendra Singh <rs...@appdynamics.com>.

i also posted this on stackoverflow but haven't got any response

Here is the link
http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty

On Mon, Feb 1, 2016 at 3:51 PM, Raghvendra Singh <rs...@appdynamics.com>
wrote:

> Thanks Prajwal
>
> I tried what you suggested but i still get the same error.
>
>
>
> On Mon, Feb 1, 2016 at 2:05 PM, Prajwal Tuladhar <pr...@infynyxx.com>
> wrote:
>
>> Hi,
>>
>> I think your usage of default for field "agentType" is invalid here.
>>
>> When generating code from invalid schema, it tends to fail:
>>
>> [INFO]
>>> [INFO] --- avro-maven-plugin:1.7.6-cdh5.4.4:schema (default) @ test-app
>>> ---
>>> [WARNING] Avro: Invalid default for field agentType: "APP_AGENT" not a
>>> ["null","string"]
>>
>>
>> Try:
>>
>> {
>>>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>  "type": "record",
>>>  "name": "MyPayLoad",
>>>  "fields": [
>>>      {"name": "filed1",  "type": "string"},
>>>      {"name": "filed2",     "type": "long"},
>>>      {"name": "filed3",  "type": "boolean"},
>>>      {
>>>           "name" : "metrics",
>>>           "type":
>>>           {
>>>              "type" : "array",
>>>              "items":
>>>              {
>>>                  "name": "MyRecord",
>>>                  "type": "record",
>>>                  "fields" :
>>>                      [
>>>                        {"name": "min", "type": "long"},
>>>                        {"name": "max", "type": "long"},
>>>                        {"name": "sum", "type": "long"},
>>>                        {"name": "count", "type": "long"}
>>>                      ]
>>>              }
>>>           }
>>>      },
>>>      {"name": "agentType",  "type": ["null", "string"], "default": null}
>>>   ]
>>> }
>>
>>
>>
>>
>>
>> On Mon, Feb 1, 2016 at 8:31 PM, Raghvendra Singh <rs...@appdynamics.com>
>> wrote:
>>
>>>
>>>
>>> down votefavorite
>>> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>>>
>>> I have this avro schema
>>>
>>> {
>>>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>  "type": "record",
>>>  "name": "MyPayLoad",
>>>  "fields": [
>>>      {"name": "filed1",  "type": "string"},
>>>      {"name": "filed2",     "type": "long"},
>>>      {"name": "filed3",  "type": "boolean"},
>>>      {
>>>           "name" : "metrics",
>>>           "type":
>>>           {
>>>              "type" : "array",
>>>              "items":
>>>              {
>>>                  "name": "MyRecord",
>>>                  "type": "record",
>>>                  "fields" :
>>>                      [
>>>                        {"name": "min", "type": "long"},
>>>                        {"name": "max", "type": "long"},
>>>                        {"name": "sum", "type": "long"},
>>>                        {"name": "count", "type": "long"}
>>>                      ]
>>>              }
>>>           }
>>>      }
>>>   ]}
>>>
>>> Here is the code which we use to parse the data
>>>
>>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>>         DatumReader<MyPayLoad> payloadReader = new SpecificDatumReader<>(MyPayLoad.class);
>>>         Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
>>>         MyPayLoad myPayLoad = null;
>>>         try {
>>>             myPayLoad = payloadReader.read(null, decoder);
>>>         } catch (IOException e) {
>>>             logger.log(Level.SEVERE, e.getMessage(), e);
>>>         }
>>>
>>>         return myPayLoad;
>>>     }
>>>
>>> Now i want to add one more field int the schema so the schema looks like
>>> below
>>>
>>>  {
>>>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>>>  "type": "record",
>>>  "name": "MyPayLoad",
>>>  "fields": [
>>>      {"name": "filed1",  "type": "string"},
>>>      {"name": "filed2",     "type": "long"},
>>>      {"name": "filed3",  "type": "boolean"},
>>>      {
>>>           "name" : "metrics",
>>>           "type":
>>>           {
>>>              "type" : "array",
>>>              "items":
>>>              {
>>>                  "name": "MyRecord",
>>>                  "type": "record",
>>>                  "fields" :
>>>                      [
>>>                        {"name": "min", "type": "long"},
>>>                        {"name": "max", "type": "long"},
>>>                        {"name": "sum", "type": "long"},
>>>                        {"name": "count", "type": "long"}
>>>                      ]
>>>              }
>>>           }
>>>      }
>>>      {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
>>>   ]}
>>>
>>> Note the filed added and also the default is defined. The problem is
>>> that if we receive the data which was written using the older schema i get
>>> this error
>>>
>>> java.io.EOFException: null
>>>     at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) ~[avro-1.7.4.jar:1.7.4]
>>>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) ~[avro-1.7.4.jar:1.7.4]
>>>     at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38) ~[blitz-shared.jar:na]
>>>
>>> What i understood from this
>>> <https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html> document
>>> that this should have been backward compatible but somehow that doesn't
>>> seem to be the case. Any idea what i am doing wrong?
>>>
>>
>>
>>
>> --
>> --
>> Cheers,
>> Praj
>>
>
>

Re: Avro schema doesn't honor backward compatibilty

Posted by Raghvendra Singh <rs...@appdynamics.com>.

Thanks Prajwal

I tried what you suggested but i still get the same error.



On Mon, Feb 1, 2016 at 2:05 PM, Prajwal Tuladhar <pr...@infynyxx.com> wrote:

> Hi,
>
> I think your usage of default for field "agentType" is invalid here.
>
> When generating code from invalid schema, it tends to fail:
>
> [INFO]
>> [INFO] --- avro-maven-plugin:1.7.6-cdh5.4.4:schema (default) @ test-app
>> ---
>> [WARNING] Avro: Invalid default for field agentType: "APP_AGENT" not a
>> ["null","string"]
>
>
> Try:
>
> {
>>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>>  "type": "record",
>>  "name": "MyPayLoad",
>>  "fields": [
>>      {"name": "filed1",  "type": "string"},
>>      {"name": "filed2",     "type": "long"},
>>      {"name": "filed3",  "type": "boolean"},
>>      {
>>           "name" : "metrics",
>>           "type":
>>           {
>>              "type" : "array",
>>              "items":
>>              {
>>                  "name": "MyRecord",
>>                  "type": "record",
>>                  "fields" :
>>                      [
>>                        {"name": "min", "type": "long"},
>>                        {"name": "max", "type": "long"},
>>                        {"name": "sum", "type": "long"},
>>                        {"name": "count", "type": "long"}
>>                      ]
>>              }
>>           }
>>      },
>>      {"name": "agentType",  "type": ["null", "string"], "default": null}
>>   ]
>> }
>
>
>
>
>
> On Mon, Feb 1, 2016 at 8:31 PM, Raghvendra Singh <rs...@appdynamics.com>
> wrote:
>
>>
>>
>> down votefavorite
>> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>>
>> I have this avro schema
>>
>> {
>>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>>  "type": "record",
>>  "name": "MyPayLoad",
>>  "fields": [
>>      {"name": "filed1",  "type": "string"},
>>      {"name": "filed2",     "type": "long"},
>>      {"name": "filed3",  "type": "boolean"},
>>      {
>>           "name" : "metrics",
>>           "type":
>>           {
>>              "type" : "array",
>>              "items":
>>              {
>>                  "name": "MyRecord",
>>                  "type": "record",
>>                  "fields" :
>>                      [
>>                        {"name": "min", "type": "long"},
>>                        {"name": "max", "type": "long"},
>>                        {"name": "sum", "type": "long"},
>>                        {"name": "count", "type": "long"}
>>                      ]
>>              }
>>           }
>>      }
>>   ]}
>>
>> Here is the code which we use to parse the data
>>
>> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>>         DatumReader<MyPayLoad> payloadReader = new SpecificDatumReader<>(MyPayLoad.class);
>>         Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
>>         MyPayLoad myPayLoad = null;
>>         try {
>>             myPayLoad = payloadReader.read(null, decoder);
>>         } catch (IOException e) {
>>             logger.log(Level.SEVERE, e.getMessage(), e);
>>         }
>>
>>         return myPayLoad;
>>     }
>>
>> Now i want to add one more field int the schema so the schema looks like
>> below
>>
>>  {
>>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>>  "type": "record",
>>  "name": "MyPayLoad",
>>  "fields": [
>>      {"name": "filed1",  "type": "string"},
>>      {"name": "filed2",     "type": "long"},
>>      {"name": "filed3",  "type": "boolean"},
>>      {
>>           "name" : "metrics",
>>           "type":
>>           {
>>              "type" : "array",
>>              "items":
>>              {
>>                  "name": "MyRecord",
>>                  "type": "record",
>>                  "fields" :
>>                      [
>>                        {"name": "min", "type": "long"},
>>                        {"name": "max", "type": "long"},
>>                        {"name": "sum", "type": "long"},
>>                        {"name": "count", "type": "long"}
>>                      ]
>>              }
>>           }
>>      }
>>      {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
>>   ]}
>>
>> Note the filed added and also the default is defined. The problem is that
>> if we receive the data which was written using the older schema i get this
>> error
>>
>> java.io.EOFException: null
>>     at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) ~[avro-1.7.4.jar:1.7.4]
>>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) ~[avro-1.7.4.jar:1.7.4]
>>     at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38) ~[blitz-shared.jar:na]
>>
>> What i understood from this
>> <https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html> document
>> that this should have been backward compatible but somehow that doesn't
>> seem to be the case. Any idea what i am doing wrong?
>>
>
>
>
> --
> --
> Cheers,
> Praj
>

Re: Avro schema doesn't honor backward compatibilty

Posted by Prajwal Tuladhar <pr...@infynyxx.com>.

Hi,

I think your usage of default for field "agentType" is invalid here.

When generating code from invalid schema, it tends to fail:

[INFO]
> [INFO] --- avro-maven-plugin:1.7.6-cdh5.4.4:schema (default) @ test-app ---
> [WARNING] Avro: Invalid default for field agentType: "APP_AGENT" not a
> ["null","string"]


Try:

{
>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>  "type": "record",
>  "name": "MyPayLoad",
>  "fields": [
>      {"name": "filed1",  "type": "string"},
>      {"name": "filed2",     "type": "long"},
>      {"name": "filed3",  "type": "boolean"},
>      {
>           "name" : "metrics",
>           "type":
>           {
>              "type" : "array",
>              "items":
>              {
>                  "name": "MyRecord",
>                  "type": "record",
>                  "fields" :
>                      [
>                        {"name": "min", "type": "long"},
>                        {"name": "max", "type": "long"},
>                        {"name": "sum", "type": "long"},
>                        {"name": "count", "type": "long"}
>                      ]
>              }
>           }
>      },
>      {"name": "agentType",  "type": ["null", "string"], "default": null}
>   ]
> }





On Mon, Feb 1, 2016 at 8:31 PM, Raghvendra Singh <rs...@appdynamics.com>
wrote:

>
>
> down votefavorite
> <http://stackoverflow.com/questions/34733604/avro-schema-doesnt-honor-backward-compatibilty#>
>
> I have this avro schema
>
> {
>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>  "type": "record",
>  "name": "MyPayLoad",
>  "fields": [
>      {"name": "filed1",  "type": "string"},
>      {"name": "filed2",     "type": "long"},
>      {"name": "filed3",  "type": "boolean"},
>      {
>           "name" : "metrics",
>           "type":
>           {
>              "type" : "array",
>              "items":
>              {
>                  "name": "MyRecord",
>                  "type": "record",
>                  "fields" :
>                      [
>                        {"name": "min", "type": "long"},
>                        {"name": "max", "type": "long"},
>                        {"name": "sum", "type": "long"},
>                        {"name": "count", "type": "long"}
>                      ]
>              }
>           }
>      }
>   ]}
>
> Here is the code which we use to parse the data
>
> public static final MyPayLoad parseBinaryPayload(byte[] payload) {
>         DatumReader<MyPayLoad> payloadReader = new SpecificDatumReader<>(MyPayLoad.class);
>         Decoder decoder = DecoderFactory.get().binaryDecoder(payload, null);
>         MyPayLoad myPayLoad = null;
>         try {
>             myPayLoad = payloadReader.read(null, decoder);
>         } catch (IOException e) {
>             logger.log(Level.SEVERE, e.getMessage(), e);
>         }
>
>         return myPayLoad;
>     }
>
> Now i want to add one more field int the schema so the schema looks like
> below
>
>  {
>  "namespace": "xx.xxxx.xxxxx.xxxxx",
>  "type": "record",
>  "name": "MyPayLoad",
>  "fields": [
>      {"name": "filed1",  "type": "string"},
>      {"name": "filed2",     "type": "long"},
>      {"name": "filed3",  "type": "boolean"},
>      {
>           "name" : "metrics",
>           "type":
>           {
>              "type" : "array",
>              "items":
>              {
>                  "name": "MyRecord",
>                  "type": "record",
>                  "fields" :
>                      [
>                        {"name": "min", "type": "long"},
>                        {"name": "max", "type": "long"},
>                        {"name": "sum", "type": "long"},
>                        {"name": "count", "type": "long"}
>                      ]
>              }
>           }
>      }
>      {"name": "agentType",  "type": ["null", "string"], "default": "APP_AGENT"}
>   ]}
>
> Note the filed added and also the default is defined. The problem is that
> if we receive the data which was written using the older schema i get this
> error
>
> java.io.EOFException: null
>     at org.apache.avro.io.BinaryDecoder.ensureBounds(BinaryDecoder.java:473) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.io.BinaryDecoder.readInt(BinaryDecoder.java:128) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.io.BinaryDecoder.readIndex(BinaryDecoder.java:423) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:229) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.io.parsing.Parser.advance(Parser.java:88) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:206) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:152) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:177) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:148) ~[avro-1.7.4.jar:1.7.4]
>     at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:139) ~[avro-1.7.4.jar:1.7.4]
>     at com.appdynamics.blitz.shared.util.XXXXXXXXXXXXX.parseBinaryPayload(BlitzAvroSharedUtil.java:38) ~[blitz-shared.jar:na]
>
> What i understood from this
> <https://martin.kleppmann.com/2012/12/05/schema-evolution-in-avro-protocol-buffers-thrift.html> document
> that this should have been backward compatible but somehow that doesn't
> seem to be the case. Any idea what i am doing wrong?
>



-- 
--
Cheers,
Praj