You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by John Bates <jo...@gmail.com> on 2012/06/06 22:08:04 UTC

robustness of Schema.Field.pos() across schema versions

Hi, all.

I'm trying to subclass an Avro IDL-generated class so that it may
implement an interface used by our project to deserialize data (and
not necessarily Avro data) from an InputStream.  Ideally, I'd like to
do something like this:

public class MySubclass extends MyAvroGeneratedClass implements
MySerializationInterface {
  @Override
  public void readObject(InputStream in) throws IOException,
      ClassNotFoundException {

    // AvroUtil.readObject exists and returns a SpecificRecord given
an InputStream and Schema
    MySubclass other = ((MyAvroGeneratedClass) AvroUtil.readObject(in,
MyAvroGeneratedClass.SCHEMA$));

    // Is this correct?  Is it robust?
    List<Schema.Field> fields = other.getSchema().getFields();
    for(Schema.Field field : fields) {
      field.name();
      int position = field.pos();
      this.put(position, other.get(position));
    }
  }
}

I'm trying to avoid having to use the setters and getters supplied by
the generated class, as that will require this subclass remains in
sync with the IDL-generated class, which will probably be a point of
failure.

Is this approach robust to changes in the schema?  That is, if the
schema changes at some point in the future, will it be possible to
deserialize data that has been serialized with an older version of the
schema?  Is there a better (read: more correct, more robust, more
consistent with Avro's design) way to do this?

I sincerely appreciate your help - I've been blocked for a few days on this.

Thanks in advance,
John Bates

Re: robustness of Schema.Field.pos() across schema versions

Posted by John Bates <jo...@gmail.com>.
Thanks very much, Doug.  It seems as though we might abandon that
approach and only support Avro data, but I appreciate your thorough
response!

Thanks again,
bates

On Wed, Jun 6, 2012 at 3:46 PM, Doug Cutting <cu...@apache.org> wrote:
> John,
>
> I think this will work fine.  The schema in SCHEMA$ is in sync with
> the generated code.
>
> You should be able to avoid this copying by instead generating code
> with a different base class that contains your methods.  In
> particular, it should be easy to modify record.vm to instead use a
> subclass of SpecificRecordBase.  Templates are found on the classpath.
>
> We might also add a feature where one can specify an alternate base
> class through an API.  This might then be used by the Maven and Ant
> tasks.  If that approach sounds useful, please file an issue in Jira.
>
> Doug
>
> On Wed, Jun 6, 2012 at 1:08 PM, John Bates <jo...@gmail.com> wrote:
>> Hi, all.
>>
>> I'm trying to subclass an Avro IDL-generated class so that it may
>> implement an interface used by our project to deserialize data (and
>> not necessarily Avro data) from an InputStream.  Ideally, I'd like to
>> do something like this:
>>
>> public class MySubclass extends MyAvroGeneratedClass implements
>> MySerializationInterface {
>>  @Override
>>  public void readObject(InputStream in) throws IOException,
>>      ClassNotFoundException {
>>
>>    // AvroUtil.readObject exists and returns a SpecificRecord given
>> an InputStream and Schema
>>    MySubclass other = ((MyAvroGeneratedClass) AvroUtil.readObject(in,
>> MyAvroGeneratedClass.SCHEMA$));
>>
>>    // Is this correct?  Is it robust?
>>    List<Schema.Field> fields = other.getSchema().getFields();
>>    for(Schema.Field field : fields) {
>>      field.name();
>>      int position = field.pos();
>>      this.put(position, other.get(position));
>>    }
>>  }
>> }
>>
>> I'm trying to avoid having to use the setters and getters supplied by
>> the generated class, as that will require this subclass remains in
>> sync with the IDL-generated class, which will probably be a point of
>> failure.
>>
>> Is this approach robust to changes in the schema?  That is, if the
>> schema changes at some point in the future, will it be possible to
>> deserialize data that has been serialized with an older version of the
>> schema?  Is there a better (read: more correct, more robust, more
>> consistent with Avro's design) way to do this?
>>
>> I sincerely appreciate your help - I've been blocked for a few days on this.
>>
>> Thanks in advance,
>> John Bates

Re: robustness of Schema.Field.pos() across schema versions

Posted by Doug Cutting <cu...@apache.org>.
John,

I think this will work fine.  The schema in SCHEMA$ is in sync with
the generated code.

You should be able to avoid this copying by instead generating code
with a different base class that contains your methods.  In
particular, it should be easy to modify record.vm to instead use a
subclass of SpecificRecordBase.  Templates are found on the classpath.

We might also add a feature where one can specify an alternate base
class through an API.  This might then be used by the Maven and Ant
tasks.  If that approach sounds useful, please file an issue in Jira.

Doug

On Wed, Jun 6, 2012 at 1:08 PM, John Bates <jo...@gmail.com> wrote:
> Hi, all.
>
> I'm trying to subclass an Avro IDL-generated class so that it may
> implement an interface used by our project to deserialize data (and
> not necessarily Avro data) from an InputStream.  Ideally, I'd like to
> do something like this:
>
> public class MySubclass extends MyAvroGeneratedClass implements
> MySerializationInterface {
>  @Override
>  public void readObject(InputStream in) throws IOException,
>      ClassNotFoundException {
>
>    // AvroUtil.readObject exists and returns a SpecificRecord given
> an InputStream and Schema
>    MySubclass other = ((MyAvroGeneratedClass) AvroUtil.readObject(in,
> MyAvroGeneratedClass.SCHEMA$));
>
>    // Is this correct?  Is it robust?
>    List<Schema.Field> fields = other.getSchema().getFields();
>    for(Schema.Field field : fields) {
>      field.name();
>      int position = field.pos();
>      this.put(position, other.get(position));
>    }
>  }
> }
>
> I'm trying to avoid having to use the setters and getters supplied by
> the generated class, as that will require this subclass remains in
> sync with the IDL-generated class, which will probably be a point of
> failure.
>
> Is this approach robust to changes in the schema?  That is, if the
> schema changes at some point in the future, will it be possible to
> deserialize data that has been serialized with an older version of the
> schema?  Is there a better (read: more correct, more robust, more
> consistent with Avro's design) way to do this?
>
> I sincerely appreciate your help - I've been blocked for a few days on this.
>
> Thanks in advance,
> John Bates