You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Alex Holmes <gr...@gmail.com> on 2011/09/19 14:12:53 UTC

Avro versioning and SpecificDatum's

Hi,

I'm starting to play with how I can support versioning with Avro.  I
created an initial schema, code-generated some some Java classes using
"org.apache.avro.tool.Main compile protocol", and then used the
DataFileWriter (with a SpecificDatumWriter) to serialize my objects to
a file.

I then modified my original schema by adding, deleting and renaming
some fields, creating version 2 of the schema.  After re-creating the
Java classes I attempted to read the version 1 file using the
DataFileStream (with a SpecificDatumReader), and this is throwing an
exception.

Is versioning supported in conjunction with the SpecificDatum*
reader/writer classes, or do I have to work at the GenericDatum level
for this to work?

Many thanks,
Alex

Re: Avro versioning and SpecificDatum's

Posted by Chris Wilkes <cw...@gmail.com>.
I'm interested in this as well, for now I've put my versioning in the
package namespace of avro definition, ie:
  com.example.avro.v1.Car
  com.example.avro.v2.Car
After all my documents that had the v1.Car have been reprocessed and
are out of use I delete the old definition.

Chris

On Mon, Sep 19, 2011 at 5:12 AM, Alex Holmes <gr...@gmail.com> wrote:
> Hi,
>
> I'm starting to play with how I can support versioning with Avro.  I
> created an initial schema, code-generated some some Java classes using
> "org.apache.avro.tool.Main compile protocol", and then used the
> DataFileWriter (with a SpecificDatumWriter) to serialize my objects to
> a file.
>
> I then modified my original schema by adding, deleting and renaming
> some fields, creating version 2 of the schema.  After re-creating the
> Java classes I attempted to read the version 1 file using the
> DataFileStream (with a SpecificDatumReader), and this is throwing an
> exception.
>
> Is versioning supported in conjunction with the SpecificDatum*
> reader/writer classes, or do I have to work at the GenericDatum level
> for this to work?
>
> Many thanks,
> Alex
>

Re: Avro versioning and SpecificDatum's

Posted by Alex Holmes <gr...@gmail.com>.
Thanks, that fixed my issue.

On Tue, Sep 20, 2011 at 2:51 PM, Scott Carey <sc...@apache.org> wrote:
> As Doug mentioned in the ticket, the problem is likely:
>
> new SpecificDatumReader<Record>()
>
>
> This should be
>
> new SpecificDatumReader<Record>(Record.class)
>
>
> Which sets the reader to resolve to the schema found in Record.class
>
>
>
> On 9/20/11 3:44 AM, "Alex Holmes" <gr...@gmail.com> wrote:
>
>>Created the following ticket:
>>
>>https://issues.apache.org/jira/browse/AVRO-891
>>
>>Thanks,
>>Alex
>>
>>On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <gr...@gmail.com> wrote:
>>> Thanks, I'll add a bug.
>>>
>>> As a FYI, even without the alias (retaining the original field name),
>>> just removing the "id" field yields the exception.
>>>
>>> On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <sc...@apache.org>
>>>wrote:
>>>> That looks like a bug.  What happens if there is no aliasing/renaming
>>>> involved?  Aliasing is a newer feature than field addition, removal,
>>>>and
>>>> promotion.
>>>>
>>>> This should be easy to reproduce, can you file a JIRA ticket?  We
>>>>should
>>>> discuss this further there.
>>>>
>>>> Thanks!
>>>>
>>>>
>>>> On 9/19/11 6:14 PM, "Alex Holmes" <gr...@gmail.com> wrote:
>>>>
>>>>>OK, I was able to reproduce the exception.
>>>>>
>>>>>v1:
>>>>>{"name": "Record", "type": "record",
>>>>>  "fields": [
>>>>>    {"name": "name", "type": "string"},
>>>>>    {"name": "id", "type": "int"}
>>>>>  ]
>>>>>}
>>>>>
>>>>>v2:
>>>>>{"name": "Record", "type": "record",
>>>>>  "fields": [
>>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>>>>>  ]
>>>>>}
>>>>>
>>>>>Step 1.  Write Avro file using v1 generated class
>>>>>Step 2.  Read Avro file using v2 generated class
>>>>>
>>>>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad
>>>>>index
>>>>>       at Record.put(Unknown Source)
>>>>>       at
>>>>>org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>>>>>       at
>>>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
>>>>>r.j
>>>>>ava:166)
>>>>>       at
>>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>>:13
>>>>>8)
>>>>>       at
>>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>>:12
>>>>>9)
>>>>>       at
>>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>>>>       at
>>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>>>>       at Read.readFromAvro(Unknown Source)
>>>>>       at Read.main(Unknown Source)
>>>>>
>>>>>The code to write/read the avro file didn't change from below.
>>>>>
>>>>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <gr...@gmail.com>
>>>>>wrote:
>>>>>> I'm trying to put together a simple test case to reproduce the
>>>>>> exception.  While I was creating the test case, I hit this behavior
>>>>>> which doesn't seem right, but maybe it's my misunderstanding on how
>>>>>> forward/backward compatibility should work:
>>>>>>
>>>>>> Schema v1:
>>>>>>
>>>>>> {"name": "Record", "type": "record",
>>>>>>  "fields": [
>>>>>>    {"name": "name", "type": "string"},
>>>>>>    {"name": "id", "type": "int"}
>>>>>>  ]
>>>>>> }
>>>>>>
>>>>>> Schema v2:
>>>>>>
>>>>>> {"name": "Record", "type": "record",
>>>>>>  "fields": [
>>>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>>>>>    {"name": "new_field", "type": "int", "default":"0"}
>>>>>>  ]
>>>>>> }
>>>>>>
>>>>>> In the 2nd version I:
>>>>>>
>>>>>> - removed field "id"
>>>>>> - renamed field "name" to "name_rename"
>>>>>> - added field "new_field"
>>>>>>
>>>>>> I write the v1 data file:
>>>>>>
>>>>>>  public static Record createRecord(String name, int id) {
>>>>>>    Record record = new Record();
>>>>>>    record.name = name;
>>>>>>    record.id = id;
>>>>>>    return record;
>>>>>>  }
>>>>>>
>>>>>>  public static void writeToAvro(OutputStream outputStream)
>>>>>>      throws IOException {
>>>>>>    DataFileWriter<Record> writer =
>>>>>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>>>>>    writer.create(Record.SCHEMA$, outputStream);
>>>>>>
>>>>>>    writer.append(createRecord("r1", 1));
>>>>>>    writer.append(createRecord("r2", 2));
>>>>>>
>>>>>>    writer.close();
>>>>>>    outputStream.close();
>>>>>>  }
>>>>>>
>>>>>> I wrote a version-agnostic Read class:
>>>>>>
>>>>>>  public static void readFromAvro(InputStream is) throws IOException {
>>>>>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>>>>>            is, new SpecificDatumReader<Record>());
>>>>>>    for (Record a : reader) {
>>>>>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>>>>>    }
>>>>>>    IOUtils.cleanup(null, is);
>>>>>>    IOUtils.cleanup(null, reader);
>>>>>>  }
>>>>>>
>>>>>> Running the Read code against the v1 data file, and including the v1
>>>>>> code-generated classes in the classpath produced:
>>>>>>
>>>>>> Record@6a8c436b[name=r1,id=1]
>>>>>> Record@6baa9f99[name=r2,id=2]
>>>>>>
>>>>>> If I run the same code, but use just the v2 generated classes in the
>>>>>> classpath I get:
>>>>>>
>>>>>> Record@39dd3812[name_rename=r1,new_field=1]
>>>>>> Record@27b15692[name_rename=r2,new_field=2]
>>>>>>
>>>>>> The name_rename field seems to be good, but why would "new_field"
>>>>>> inherit the values of the deleted field "id"?
>>>>>>
>>>>>> Cheers,
>>>>>> Alex
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cu...@apache.org>
>>>>>>wrote:
>>>>>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>>>>>>> I then modified my original schema by adding, deleting and renaming
>>>>>>>> some fields, creating version 2 of the schema.  After re-creating
>>>>>>>>the
>>>>>>>> Java classes I attempted to read the version 1 file using the
>>>>>>>> DataFileStream (with a SpecificDatumReader), and this is throwing
>>>>>>>>an
>>>>>>>> exception.
>>>>>>>
>>>>>>> This should work.  Can you provide more detail?  What is the
>>>>>>>exception?
>>>>>>>  A reproducible test case would be great to have.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Doug
>>>>>>>
>>>>>>
>>>>
>>>>
>>>>
>>>
>
>
>

Re: Avro versioning and SpecificDatum's

Posted by Scott Carey <sc...@apache.org>.
As Doug mentioned in the ticket, the problem is likely:

new SpecificDatumReader<Record>()


This should be

new SpecificDatumReader<Record>(Record.class)


Which sets the reader to resolve to the schema found in Record.class



On 9/20/11 3:44 AM, "Alex Holmes" <gr...@gmail.com> wrote:

>Created the following ticket:
>
>https://issues.apache.org/jira/browse/AVRO-891
>
>Thanks,
>Alex
>
>On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <gr...@gmail.com> wrote:
>> Thanks, I'll add a bug.
>>
>> As a FYI, even without the alias (retaining the original field name),
>> just removing the "id" field yields the exception.
>>
>> On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <sc...@apache.org>
>>wrote:
>>> That looks like a bug.  What happens if there is no aliasing/renaming
>>> involved?  Aliasing is a newer feature than field addition, removal,
>>>and
>>> promotion.
>>>
>>> This should be easy to reproduce, can you file a JIRA ticket?  We
>>>should
>>> discuss this further there.
>>>
>>> Thanks!
>>>
>>>
>>> On 9/19/11 6:14 PM, "Alex Holmes" <gr...@gmail.com> wrote:
>>>
>>>>OK, I was able to reproduce the exception.
>>>>
>>>>v1:
>>>>{"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name", "type": "string"},
>>>>    {"name": "id", "type": "int"}
>>>>  ]
>>>>}
>>>>
>>>>v2:
>>>>{"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>>>>  ]
>>>>}
>>>>
>>>>Step 1.  Write Avro file using v1 generated class
>>>>Step 2.  Read Avro file using v2 generated class
>>>>
>>>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad
>>>>index
>>>>       at Record.put(Unknown Source)
>>>>       at 
>>>>org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>>>>       at
>>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReade
>>>>r.j
>>>>ava:166)
>>>>       at
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:13
>>>>8)
>>>>       at
>>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java
>>>>:12
>>>>9)
>>>>       at 
>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>>>       at 
>>>>org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>>>       at Read.readFromAvro(Unknown Source)
>>>>       at Read.main(Unknown Source)
>>>>
>>>>The code to write/read the avro file didn't change from below.
>>>>
>>>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <gr...@gmail.com>
>>>>wrote:
>>>>> I'm trying to put together a simple test case to reproduce the
>>>>> exception.  While I was creating the test case, I hit this behavior
>>>>> which doesn't seem right, but maybe it's my misunderstanding on how
>>>>> forward/backward compatibility should work:
>>>>>
>>>>> Schema v1:
>>>>>
>>>>> {"name": "Record", "type": "record",
>>>>>  "fields": [
>>>>>    {"name": "name", "type": "string"},
>>>>>    {"name": "id", "type": "int"}
>>>>>  ]
>>>>> }
>>>>>
>>>>> Schema v2:
>>>>>
>>>>> {"name": "Record", "type": "record",
>>>>>  "fields": [
>>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>>>>    {"name": "new_field", "type": "int", "default":"0"}
>>>>>  ]
>>>>> }
>>>>>
>>>>> In the 2nd version I:
>>>>>
>>>>> - removed field "id"
>>>>> - renamed field "name" to "name_rename"
>>>>> - added field "new_field"
>>>>>
>>>>> I write the v1 data file:
>>>>>
>>>>>  public static Record createRecord(String name, int id) {
>>>>>    Record record = new Record();
>>>>>    record.name = name;
>>>>>    record.id = id;
>>>>>    return record;
>>>>>  }
>>>>>
>>>>>  public static void writeToAvro(OutputStream outputStream)
>>>>>      throws IOException {
>>>>>    DataFileWriter<Record> writer =
>>>>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>>>>    writer.create(Record.SCHEMA$, outputStream);
>>>>>
>>>>>    writer.append(createRecord("r1", 1));
>>>>>    writer.append(createRecord("r2", 2));
>>>>>
>>>>>    writer.close();
>>>>>    outputStream.close();
>>>>>  }
>>>>>
>>>>> I wrote a version-agnostic Read class:
>>>>>
>>>>>  public static void readFromAvro(InputStream is) throws IOException {
>>>>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>>>>            is, new SpecificDatumReader<Record>());
>>>>>    for (Record a : reader) {
>>>>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>>>>    }
>>>>>    IOUtils.cleanup(null, is);
>>>>>    IOUtils.cleanup(null, reader);
>>>>>  }
>>>>>
>>>>> Running the Read code against the v1 data file, and including the v1
>>>>> code-generated classes in the classpath produced:
>>>>>
>>>>> Record@6a8c436b[name=r1,id=1]
>>>>> Record@6baa9f99[name=r2,id=2]
>>>>>
>>>>> If I run the same code, but use just the v2 generated classes in the
>>>>> classpath I get:
>>>>>
>>>>> Record@39dd3812[name_rename=r1,new_field=1]
>>>>> Record@27b15692[name_rename=r2,new_field=2]
>>>>>
>>>>> The name_rename field seems to be good, but why would "new_field"
>>>>> inherit the values of the deleted field "id"?
>>>>>
>>>>> Cheers,
>>>>> Alex
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cu...@apache.org>
>>>>>wrote:
>>>>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>>>>>> I then modified my original schema by adding, deleting and renaming
>>>>>>> some fields, creating version 2 of the schema.  After re-creating
>>>>>>>the
>>>>>>> Java classes I attempted to read the version 1 file using the
>>>>>>> DataFileStream (with a SpecificDatumReader), and this is throwing
>>>>>>>an
>>>>>>> exception.
>>>>>>
>>>>>> This should work.  Can you provide more detail?  What is the
>>>>>>exception?
>>>>>>  A reproducible test case would be great to have.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Doug
>>>>>>
>>>>>
>>>
>>>
>>>
>>



Re: Avro versioning and SpecificDatum's

Posted by Alex Holmes <gr...@gmail.com>.
Created the following ticket:

https://issues.apache.org/jira/browse/AVRO-891

Thanks,
Alex

On Tue, Sep 20, 2011 at 6:26 AM, Alex Holmes <gr...@gmail.com> wrote:
> Thanks, I'll add a bug.
>
> As a FYI, even without the alias (retaining the original field name),
> just removing the "id" field yields the exception.
>
> On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <sc...@apache.org> wrote:
>> That looks like a bug.  What happens if there is no aliasing/renaming
>> involved?  Aliasing is a newer feature than field addition, removal, and
>> promotion.
>>
>> This should be easy to reproduce, can you file a JIRA ticket?  We should
>> discuss this further there.
>>
>> Thanks!
>>
>>
>> On 9/19/11 6:14 PM, "Alex Holmes" <gr...@gmail.com> wrote:
>>
>>>OK, I was able to reproduce the exception.
>>>
>>>v1:
>>>{"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name", "type": "string"},
>>>    {"name": "id", "type": "int"}
>>>  ]
>>>}
>>>
>>>v2:
>>>{"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>>>  ]
>>>}
>>>
>>>Step 1.  Write Avro file using v1 generated class
>>>Step 2.  Read Avro file using v2 generated class
>>>
>>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
>>>       at Record.put(Unknown Source)
>>>       at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>>>       at
>>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
>>>ava:166)
>>>       at
>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>>>8)
>>>       at
>>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>>>9)
>>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>>       at Read.readFromAvro(Unknown Source)
>>>       at Read.main(Unknown Source)
>>>
>>>The code to write/read the avro file didn't change from below.
>>>
>>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <gr...@gmail.com> wrote:
>>>> I'm trying to put together a simple test case to reproduce the
>>>> exception.  While I was creating the test case, I hit this behavior
>>>> which doesn't seem right, but maybe it's my misunderstanding on how
>>>> forward/backward compatibility should work:
>>>>
>>>> Schema v1:
>>>>
>>>> {"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name", "type": "string"},
>>>>    {"name": "id", "type": "int"}
>>>>  ]
>>>> }
>>>>
>>>> Schema v2:
>>>>
>>>> {"name": "Record", "type": "record",
>>>>  "fields": [
>>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>>>    {"name": "new_field", "type": "int", "default":"0"}
>>>>  ]
>>>> }
>>>>
>>>> In the 2nd version I:
>>>>
>>>> - removed field "id"
>>>> - renamed field "name" to "name_rename"
>>>> - added field "new_field"
>>>>
>>>> I write the v1 data file:
>>>>
>>>>  public static Record createRecord(String name, int id) {
>>>>    Record record = new Record();
>>>>    record.name = name;
>>>>    record.id = id;
>>>>    return record;
>>>>  }
>>>>
>>>>  public static void writeToAvro(OutputStream outputStream)
>>>>      throws IOException {
>>>>    DataFileWriter<Record> writer =
>>>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>>>    writer.create(Record.SCHEMA$, outputStream);
>>>>
>>>>    writer.append(createRecord("r1", 1));
>>>>    writer.append(createRecord("r2", 2));
>>>>
>>>>    writer.close();
>>>>    outputStream.close();
>>>>  }
>>>>
>>>> I wrote a version-agnostic Read class:
>>>>
>>>>  public static void readFromAvro(InputStream is) throws IOException {
>>>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>>>            is, new SpecificDatumReader<Record>());
>>>>    for (Record a : reader) {
>>>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>>>    }
>>>>    IOUtils.cleanup(null, is);
>>>>    IOUtils.cleanup(null, reader);
>>>>  }
>>>>
>>>> Running the Read code against the v1 data file, and including the v1
>>>> code-generated classes in the classpath produced:
>>>>
>>>> Record@6a8c436b[name=r1,id=1]
>>>> Record@6baa9f99[name=r2,id=2]
>>>>
>>>> If I run the same code, but use just the v2 generated classes in the
>>>> classpath I get:
>>>>
>>>> Record@39dd3812[name_rename=r1,new_field=1]
>>>> Record@27b15692[name_rename=r2,new_field=2]
>>>>
>>>> The name_rename field seems to be good, but why would "new_field"
>>>> inherit the values of the deleted field "id"?
>>>>
>>>> Cheers,
>>>> Alex
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cu...@apache.org>
>>>>wrote:
>>>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>>>>> I then modified my original schema by adding, deleting and renaming
>>>>>> some fields, creating version 2 of the schema.  After re-creating the
>>>>>> Java classes I attempted to read the version 1 file using the
>>>>>> DataFileStream (with a SpecificDatumReader), and this is throwing an
>>>>>> exception.
>>>>>
>>>>> This should work.  Can you provide more detail?  What is the exception?
>>>>>  A reproducible test case would be great to have.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Doug
>>>>>
>>>>
>>
>>
>>
>

Re: Avro versioning and SpecificDatum's

Posted by Alex Holmes <gr...@gmail.com>.
Thanks, I'll add a bug.

As a FYI, even without the alias (retaining the original field name),
just removing the "id" field yields the exception.

On Tue, Sep 20, 2011 at 2:22 AM, Scott Carey <sc...@apache.org> wrote:
> That looks like a bug.  What happens if there is no aliasing/renaming
> involved?  Aliasing is a newer feature than field addition, removal, and
> promotion.
>
> This should be easy to reproduce, can you file a JIRA ticket?  We should
> discuss this further there.
>
> Thanks!
>
>
> On 9/19/11 6:14 PM, "Alex Holmes" <gr...@gmail.com> wrote:
>
>>OK, I was able to reproduce the exception.
>>
>>v1:
>>{"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name", "type": "string"},
>>    {"name": "id", "type": "int"}
>>  ]
>>}
>>
>>v2:
>>{"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>>  ]
>>}
>>
>>Step 1.  Write Avro file using v1 generated class
>>Step 2.  Read Avro file using v2 generated class
>>
>>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
>>       at Record.put(Unknown Source)
>>       at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>>       at
>>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
>>ava:166)
>>       at
>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>>8)
>>       at
>>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>>9)
>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>>       at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>>       at Read.readFromAvro(Unknown Source)
>>       at Read.main(Unknown Source)
>>
>>The code to write/read the avro file didn't change from below.
>>
>>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <gr...@gmail.com> wrote:
>>> I'm trying to put together a simple test case to reproduce the
>>> exception.  While I was creating the test case, I hit this behavior
>>> which doesn't seem right, but maybe it's my misunderstanding on how
>>> forward/backward compatibility should work:
>>>
>>> Schema v1:
>>>
>>> {"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name", "type": "string"},
>>>    {"name": "id", "type": "int"}
>>>  ]
>>> }
>>>
>>> Schema v2:
>>>
>>> {"name": "Record", "type": "record",
>>>  "fields": [
>>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>>    {"name": "new_field", "type": "int", "default":"0"}
>>>  ]
>>> }
>>>
>>> In the 2nd version I:
>>>
>>> - removed field "id"
>>> - renamed field "name" to "name_rename"
>>> - added field "new_field"
>>>
>>> I write the v1 data file:
>>>
>>>  public static Record createRecord(String name, int id) {
>>>    Record record = new Record();
>>>    record.name = name;
>>>    record.id = id;
>>>    return record;
>>>  }
>>>
>>>  public static void writeToAvro(OutputStream outputStream)
>>>      throws IOException {
>>>    DataFileWriter<Record> writer =
>>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>>    writer.create(Record.SCHEMA$, outputStream);
>>>
>>>    writer.append(createRecord("r1", 1));
>>>    writer.append(createRecord("r2", 2));
>>>
>>>    writer.close();
>>>    outputStream.close();
>>>  }
>>>
>>> I wrote a version-agnostic Read class:
>>>
>>>  public static void readFromAvro(InputStream is) throws IOException {
>>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>>            is, new SpecificDatumReader<Record>());
>>>    for (Record a : reader) {
>>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>>    }
>>>    IOUtils.cleanup(null, is);
>>>    IOUtils.cleanup(null, reader);
>>>  }
>>>
>>> Running the Read code against the v1 data file, and including the v1
>>> code-generated classes in the classpath produced:
>>>
>>> Record@6a8c436b[name=r1,id=1]
>>> Record@6baa9f99[name=r2,id=2]
>>>
>>> If I run the same code, but use just the v2 generated classes in the
>>> classpath I get:
>>>
>>> Record@39dd3812[name_rename=r1,new_field=1]
>>> Record@27b15692[name_rename=r2,new_field=2]
>>>
>>> The name_rename field seems to be good, but why would "new_field"
>>> inherit the values of the deleted field "id"?
>>>
>>> Cheers,
>>> Alex
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cu...@apache.org>
>>>wrote:
>>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>>>> I then modified my original schema by adding, deleting and renaming
>>>>> some fields, creating version 2 of the schema.  After re-creating the
>>>>> Java classes I attempted to read the version 1 file using the
>>>>> DataFileStream (with a SpecificDatumReader), and this is throwing an
>>>>> exception.
>>>>
>>>> This should work.  Can you provide more detail?  What is the exception?
>>>>  A reproducible test case would be great to have.
>>>>
>>>> Thanks,
>>>>
>>>> Doug
>>>>
>>>
>
>
>

Re: Avro versioning and SpecificDatum's

Posted by Scott Carey <sc...@apache.org>.
That looks like a bug.  What happens if there is no aliasing/renaming
involved?  Aliasing is a newer feature than field addition, removal, and
promotion.

This should be easy to reproduce, can you file a JIRA ticket?  We should
discuss this further there.

Thanks!


On 9/19/11 6:14 PM, "Alex Holmes" <gr...@gmail.com> wrote:

>OK, I was able to reproduce the exception.
>
>v1:
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name", "type": "string"},
>    {"name": "id", "type": "int"}
>  ]
>}
>
>v2:
>{"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name_rename", "type": "string", "aliases": ["name"]}
>  ]
>}
>
>Step 1.  Write Avro file using v1 generated class
>Step 2.  Read Avro file using v2 generated class
>
>Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
>	at Record.put(Unknown Source)
>	at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
>	at 
>org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
>ava:166)
>	at 
>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
>8)
>	at 
>org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:12
>9)
>	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
>	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
>	at Read.readFromAvro(Unknown Source)
>	at Read.main(Unknown Source)
>
>The code to write/read the avro file didn't change from below.
>
>On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <gr...@gmail.com> wrote:
>> I'm trying to put together a simple test case to reproduce the
>> exception.  While I was creating the test case, I hit this behavior
>> which doesn't seem right, but maybe it's my misunderstanding on how
>> forward/backward compatibility should work:
>>
>> Schema v1:
>>
>> {"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name", "type": "string"},
>>    {"name": "id", "type": "int"}
>>  ]
>> }
>>
>> Schema v2:
>>
>> {"name": "Record", "type": "record",
>>  "fields": [
>>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>>    {"name": "new_field", "type": "int", "default":"0"}
>>  ]
>> }
>>
>> In the 2nd version I:
>>
>> - removed field "id"
>> - renamed field "name" to "name_rename"
>> - added field "new_field"
>>
>> I write the v1 data file:
>>
>>  public static Record createRecord(String name, int id) {
>>    Record record = new Record();
>>    record.name = name;
>>    record.id = id;
>>    return record;
>>  }
>>
>>  public static void writeToAvro(OutputStream outputStream)
>>      throws IOException {
>>    DataFileWriter<Record> writer =
>>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>>    writer.create(Record.SCHEMA$, outputStream);
>>
>>    writer.append(createRecord("r1", 1));
>>    writer.append(createRecord("r2", 2));
>>
>>    writer.close();
>>    outputStream.close();
>>  }
>>
>> I wrote a version-agnostic Read class:
>>
>>  public static void readFromAvro(InputStream is) throws IOException {
>>    DataFileStream<Record> reader = new DataFileStream<Record>(
>>            is, new SpecificDatumReader<Record>());
>>    for (Record a : reader) {
>>      System.out.println(ToStringBuilder.reflectionToString(a));
>>    }
>>    IOUtils.cleanup(null, is);
>>    IOUtils.cleanup(null, reader);
>>  }
>>
>> Running the Read code against the v1 data file, and including the v1
>> code-generated classes in the classpath produced:
>>
>> Record@6a8c436b[name=r1,id=1]
>> Record@6baa9f99[name=r2,id=2]
>>
>> If I run the same code, but use just the v2 generated classes in the
>> classpath I get:
>>
>> Record@39dd3812[name_rename=r1,new_field=1]
>> Record@27b15692[name_rename=r2,new_field=2]
>>
>> The name_rename field seems to be good, but why would "new_field"
>> inherit the values of the deleted field "id"?
>>
>> Cheers,
>> Alex
>>
>>
>>
>>
>>
>>
>>
>> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cu...@apache.org>
>>wrote:
>>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>>> I then modified my original schema by adding, deleting and renaming
>>>> some fields, creating version 2 of the schema.  After re-creating the
>>>> Java classes I attempted to read the version 1 file using the
>>>> DataFileStream (with a SpecificDatumReader), and this is throwing an
>>>> exception.
>>>
>>> This should work.  Can you provide more detail?  What is the exception?
>>>  A reproducible test case would be great to have.
>>>
>>> Thanks,
>>>
>>> Doug
>>>
>>



Re: Avro versioning and SpecificDatum's

Posted by Alex Holmes <gr...@gmail.com>.
OK, I was able to reproduce the exception.

v1:
{"name": "Record", "type": "record",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "id", "type": "int"}
  ]
}

v2:
{"name": "Record", "type": "record",
  "fields": [
    {"name": "name_rename", "type": "string", "aliases": ["name"]}
  ]
}

Step 1.  Write Avro file using v1 generated class
Step 2.  Read Avro file using v2 generated class

Exception in thread "main" org.apache.avro.AvroRuntimeException: Bad index
	at Record.put(Unknown Source)
	at org.apache.avro.generic.GenericData.setField(GenericData.java:463)
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:166)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:138)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:129)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
	at Read.readFromAvro(Unknown Source)
	at Read.main(Unknown Source)

The code to write/read the avro file didn't change from below.

On Mon, Sep 19, 2011 at 9:08 PM, Alex Holmes <gr...@gmail.com> wrote:
> I'm trying to put together a simple test case to reproduce the
> exception.  While I was creating the test case, I hit this behavior
> which doesn't seem right, but maybe it's my misunderstanding on how
> forward/backward compatibility should work:
>
> Schema v1:
>
> {"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name", "type": "string"},
>    {"name": "id", "type": "int"}
>  ]
> }
>
> Schema v2:
>
> {"name": "Record", "type": "record",
>  "fields": [
>    {"name": "name_rename", "type": "string", "aliases": ["name"]},
>    {"name": "new_field", "type": "int", "default":"0"}
>  ]
> }
>
> In the 2nd version I:
>
> - removed field "id"
> - renamed field "name" to "name_rename"
> - added field "new_field"
>
> I write the v1 data file:
>
>  public static Record createRecord(String name, int id) {
>    Record record = new Record();
>    record.name = name;
>    record.id = id;
>    return record;
>  }
>
>  public static void writeToAvro(OutputStream outputStream)
>      throws IOException {
>    DataFileWriter<Record> writer =
>        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
>    writer.create(Record.SCHEMA$, outputStream);
>
>    writer.append(createRecord("r1", 1));
>    writer.append(createRecord("r2", 2));
>
>    writer.close();
>    outputStream.close();
>  }
>
> I wrote a version-agnostic Read class:
>
>  public static void readFromAvro(InputStream is) throws IOException {
>    DataFileStream<Record> reader = new DataFileStream<Record>(
>            is, new SpecificDatumReader<Record>());
>    for (Record a : reader) {
>      System.out.println(ToStringBuilder.reflectionToString(a));
>    }
>    IOUtils.cleanup(null, is);
>    IOUtils.cleanup(null, reader);
>  }
>
> Running the Read code against the v1 data file, and including the v1
> code-generated classes in the classpath produced:
>
> Record@6a8c436b[name=r1,id=1]
> Record@6baa9f99[name=r2,id=2]
>
> If I run the same code, but use just the v2 generated classes in the
> classpath I get:
>
> Record@39dd3812[name_rename=r1,new_field=1]
> Record@27b15692[name_rename=r2,new_field=2]
>
> The name_rename field seems to be good, but why would "new_field"
> inherit the values of the deleted field "id"?
>
> Cheers,
> Alex
>
>
>
>
>
>
>
> On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cu...@apache.org> wrote:
>> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>>> I then modified my original schema by adding, deleting and renaming
>>> some fields, creating version 2 of the schema.  After re-creating the
>>> Java classes I attempted to read the version 1 file using the
>>> DataFileStream (with a SpecificDatumReader), and this is throwing an
>>> exception.
>>
>> This should work.  Can you provide more detail?  What is the exception?
>>  A reproducible test case would be great to have.
>>
>> Thanks,
>>
>> Doug
>>
>

Re: Avro versioning and SpecificDatum's

Posted by Alex Holmes <gr...@gmail.com>.
I'm trying to put together a simple test case to reproduce the
exception.  While I was creating the test case, I hit this behavior
which doesn't seem right, but maybe it's my misunderstanding on how
forward/backward compatibility should work:

Schema v1:

{"name": "Record", "type": "record",
  "fields": [
    {"name": "name", "type": "string"},
    {"name": "id", "type": "int"}
  ]
}

Schema v2:

{"name": "Record", "type": "record",
  "fields": [
    {"name": "name_rename", "type": "string", "aliases": ["name"]},
    {"name": "new_field", "type": "int", "default":"0"}
  ]
}

In the 2nd version I:

- removed field "id"
- renamed field "name" to "name_rename"
- added field "new_field"

I write the v1 data file:

  public static Record createRecord(String name, int id) {
    Record record = new Record();
    record.name = name;
    record.id = id;
    return record;
  }

  public static void writeToAvro(OutputStream outputStream)
      throws IOException {
    DataFileWriter<Record> writer =
        new DataFileWriter<Record>(new SpecificDatumWriter<Record>());
    writer.create(Record.SCHEMA$, outputStream);

    writer.append(createRecord("r1", 1));
    writer.append(createRecord("r2", 2));

    writer.close();
    outputStream.close();
  }

I wrote a version-agnostic Read class:

  public static void readFromAvro(InputStream is) throws IOException {
    DataFileStream<Record> reader = new DataFileStream<Record>(
            is, new SpecificDatumReader<Record>());
    for (Record a : reader) {
      System.out.println(ToStringBuilder.reflectionToString(a));
    }
    IOUtils.cleanup(null, is);
    IOUtils.cleanup(null, reader);
  }

Running the Read code against the v1 data file, and including the v1
code-generated classes in the classpath produced:

Record@6a8c436b[name=r1,id=1]
Record@6baa9f99[name=r2,id=2]

If I run the same code, but use just the v2 generated classes in the
classpath I get:

Record@39dd3812[name_rename=r1,new_field=1]
Record@27b15692[name_rename=r2,new_field=2]

The name_rename field seems to be good, but why would "new_field"
inherit the values of the deleted field "id"?

Cheers,
Alex







On Mon, Sep 19, 2011 at 12:35 PM, Doug Cutting <cu...@apache.org> wrote:
> On 09/19/2011 05:12 AM, Alex Holmes wrote:
>> I then modified my original schema by adding, deleting and renaming
>> some fields, creating version 2 of the schema.  After re-creating the
>> Java classes I attempted to read the version 1 file using the
>> DataFileStream (with a SpecificDatumReader), and this is throwing an
>> exception.
>
> This should work.  Can you provide more detail?  What is the exception?
>  A reproducible test case would be great to have.
>
> Thanks,
>
> Doug
>

Re: Avro versioning and SpecificDatum's

Posted by Doug Cutting <cu...@apache.org>.
On 09/19/2011 05:12 AM, Alex Holmes wrote:
> I then modified my original schema by adding, deleting and renaming
> some fields, creating version 2 of the schema.  After re-creating the
> Java classes I attempted to read the version 1 file using the
> DataFileStream (with a SpecificDatumReader), and this is throwing an
> exception.

This should work.  Can you provide more detail?  What is the exception?
 A reproducible test case would be great to have.

Thanks,

Doug

Re: Avro versioning and SpecificDatum's

Posted by Scott Carey <sc...@apache.org>.
What if you don't specify the schemas?

The writer schema is in the data file, and configured automatically if
unset.
The reader schema is in the class, and configured automatically in the
SpecificDatumReader constructor.



On 9/19/11 11:23 AM, "Rohini U" <ro...@gmail.com> wrote:

> I have also seen this issue when I write an avro object using
> SpecificDatumWriter and
> read it back using SpecificDatumReader, it complains saying that the schemas
> do not match even though I specify reader and writer schemas.
> 
> 
> On Mon, Sep 19, 2011 at 11:16 AM, Scott Carey <sc...@apache.org> wrote:
>> I version with SpecificDatum objects using avro data files and it works
>> fine.
>> 
>> I have seen problems arise if a user is configuring or reconfiguring the
>> schemas on the DatumReader passed into the construction of the
>> DataFileReader.
>> 
>> 
>> In the case of SpecificDatumReader, it is as simple as:
>> 
>> DatumReader<T> reader = new SpecificDatumReader<T>(T.class);
>> DataFileReader<T> fileReader = new DataFileReader(file, reader);
>> 
>> 
>> 
>> On 9/19/11 5:12 AM, "Alex Holmes" <gr...@gmail.com> wrote:
>> 
>>> >Hi,
>>> >
>>> >I'm starting to play with how I can support versioning with Avro.  I
>>> >created an initial schema, code-generated some some Java classes using
>>> >"org.apache.avro.tool.Main compile protocol", and then used the
>>> >DataFileWriter (with a SpecificDatumWriter) to serialize my objects to
>>> >a file.
>>> >
>>> >I then modified my original schema by adding, deleting and renaming
>>> >some fields, creating version 2 of the schema.  After re-creating the
>>> >Java classes I attempted to read the version 1 file using the
>>> >DataFileStream (with a SpecificDatumReader), and this is throwing an
>>> >exception.
>>> >
>>> >Is versioning supported in conjunction with the SpecificDatum*
>>> >reader/writer classes, or do I have to work at the GenericDatum level
>>> >for this to work?
>>> >
>>> >Many thanks,
>>> >Alex
>> 
>> 
> 
> 
> 
> -- 
> Regards
> -Rohini
> 
> --  
>  
> People of accomplishment rarely sat back & let things happen to them. They
> went out & happened to things - Leonardo Da Vinci
> 
> 



Re: Avro versioning and SpecificDatum's

Posted by Rohini U <ro...@gmail.com>.
I have also seen this issue when I write an avro object using
SpecificDatumWriter and
read it back using SpecificDatumReader, it complains saying that the schemas
do not match even though I specify reader and writer schemas.


On Mon, Sep 19, 2011 at 11:16 AM, Scott Carey <sc...@apache.org> wrote:

> I version with SpecificDatum objects using avro data files and it works
> fine.
>
> I have seen problems arise if a user is configuring or reconfiguring the
> schemas on the DatumReader passed into the construction of the
> DataFileReader.
>
>
> In the case of SpecificDatumReader, it is as simple as:
>
> DatumReader<T> reader = new SpecificDatumReader<T>(T.class);
> DataFileReader<T> fileReader = new DataFileReader(file, reader);
>
>
>
> On 9/19/11 5:12 AM, "Alex Holmes" <gr...@gmail.com> wrote:
>
> >Hi,
> >
> >I'm starting to play with how I can support versioning with Avro.  I
> >created an initial schema, code-generated some some Java classes using
> >"org.apache.avro.tool.Main compile protocol", and then used the
> >DataFileWriter (with a SpecificDatumWriter) to serialize my objects to
> >a file.
> >
> >I then modified my original schema by adding, deleting and renaming
> >some fields, creating version 2 of the schema.  After re-creating the
> >Java classes I attempted to read the version 1 file using the
> >DataFileStream (with a SpecificDatumReader), and this is throwing an
> >exception.
> >
> >Is versioning supported in conjunction with the SpecificDatum*
> >reader/writer classes, or do I have to work at the GenericDatum level
> >for this to work?
> >
> >Many thanks,
> >Alex
>
>
>


-- 
Regards
-Rohini

--
**
People of accomplishment rarely sat back & let things happen to them. They
went out & happened to things - Leonardo Da Vinci

Re: Avro versioning and SpecificDatum's

Posted by Scott Carey <sc...@apache.org>.
I version with SpecificDatum objects using avro data files and it works
fine.

I have seen problems arise if a user is configuring or reconfiguring the
schemas on the DatumReader passed into the construction of the
DataFileReader.


In the case of SpecificDatumReader, it is as simple as:

DatumReader<T> reader = new SpecificDatumReader<T>(T.class);
DataFileReader<T> fileReader = new DataFileReader(file, reader);



On 9/19/11 5:12 AM, "Alex Holmes" <gr...@gmail.com> wrote:

>Hi,
>
>I'm starting to play with how I can support versioning with Avro.  I
>created an initial schema, code-generated some some Java classes using
>"org.apache.avro.tool.Main compile protocol", and then used the
>DataFileWriter (with a SpecificDatumWriter) to serialize my objects to
>a file.
>
>I then modified my original schema by adding, deleting and renaming
>some fields, creating version 2 of the schema.  After re-creating the
>Java classes I attempted to read the version 1 file using the
>DataFileStream (with a SpecificDatumReader), and this is throwing an
>exception.
>
>Is versioning supported in conjunction with the SpecificDatum*
>reader/writer classes, or do I have to work at the GenericDatum level
>for this to work?
>
>Many thanks,
>Alex