You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by Christophe Taton <ch...@gmail.com> on 2013/12/02 22:04:59 UTC
Effort towards Avro 2.0?
Hi all,
Avro, in its current form, exhibits a number of limitations that are hard
to work with or around, and hard to fix within the scope of Avro 1.x :
fixing these issues would introduce incompatible changes that warrant a
major version bump, ie. Avro 2.0. An Avro 2.0 branch would be an
opportunity to address most issues that appeared held back for
compatibility purposes so far.
I would like to initiate an effort in this direction and I am willing to do
the necessary work to gather and organize requirements, and draft a design
document for what Avro 2.0 would look like. For this reason, if you have
opinions regarding an Avro 2.0 branch or regarding issues and features that
could fit in Avro 2.0, please reply to this thread.
To bootstrap, below is a list I gathered over the last couple of years from
several discussions:
- Specification
- Improved support for unions (incompatible change with named unions and
union properties).
- New extension data type, similar to ProtocolBuffer extensions
(incompatible change).
- Clear separation between Avro schema (data format) and specific API
client concerns: for example, the way Avro strings are exposed
through the
Java API should not pollute the schema definition. Each particular Java
client should configure their own decoders with the way they want Avro
strings to be represented.
- Clarification of compatibility and type promotion (safe lossless
conversions vs. best-effort lossy conversions): promoting int to float
potentially loses precision, which is not necessarily acceptable for all
clients. Avro decoders should let clients configure which mode they need.
- IDL
- Generalized IDL for Avro schemas.
- Support for recursive records.
- Meta-schema : IDL definition for a schema.
- Java API
- Truly immutable schema objects (no properties / hashcode mutation
after construction).
- Immutable records.
- Complete record builder API (current record builders do not play
well with nested records).
- Complete generic API (there currently is no GenericUnion or
GenericMap).
- Improved unions support : union values as java.lang.Object are less
than ideal; union values could expose the union branch through an enum
(nulls could be handled specifically).
- Python 3 support
- RPC
- SASL support
- Full Python/Java parity and interoperability.
Please, comment or extend this list. Provided enough interest, I'll happily
digest feedback and organize it into a document (most likely a wiki page?).
Thanks,
Christophe