You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by John Kristian <jk...@linkedin.com> on 2011/02/01 23:31:52 UTC

Object Schema Resolution

Can the Java implementation of Avro do schema resolution without deserializing?  For example, can it convert a generic object to a specific object with a different schema?  It seems possible: the specific class contains the reader’s schema, and the generic object contains the writer’s schema and data.  But I don’t see how to do it with Java, without serializing and deserializing.

This would be useful in some RPC servers, enabling low level software to deserialize a generic object, which high level software might subsequently resolve to a specific object.

- John Kristian

Re: Object Schema Resolution

Posted by Scott Carey <sc...@richrelevance.com>.
There is no current 'easy' API for this.

Internally there is machinery that could be applied to that problem.  You could look at Parser, ResolvingDecoder, ValidatingDecoder, ParsingDecoder, and ResolvingGrammarGenerator, but that code is complicated.  See http://avro.apache.org/docs/current/api/java/org/apache/avro/io/parsing/doc-files/parsing.html.  The ResolvingGrammarGenerator creates a sequence of 'steps' to go through to interpret the writer's schema from the reader's perspective.

It turns out that Avro is more than a Serialization system, it is also a Schema migration and translation system.  You can map this problem to CFG's and other paradigms.  We do not yet have any "schema-tools" libraries for doing more than serialization and deserialization.

This is an interesting use case that overlaps with several other similar problems:
You have a data structure that can be seen as corresponding with a Schema given a set of traversal rules.  You have another mapping of Schema to data structure and you want to map between them, including schema resolution.  Its worth filing a JIRA on this, who knows who might be interested in contributing such a set of tools?

In the medium term, we might have a different approach that will solve your specific problem:  Objects are arriving in generic form and you want to access them in specific form.  A future version of Generic/Specific/Reflect might make this easier, perhaps by supporting some sort of 'resolving wrapper' object generation.  For one thing, Generic and Specific share the IndexedRecord interface so that may not be too hard.

All good ideas worth putting in JIRA, though unfortunately none are implemented.

-Scott

On 2/1/11 2:31 PM, "John Kristian" <jk...@linkedin.com>> wrote:

Can the Java implementation of Avro do schema resolution without deserializing?  For example, can it convert a generic object to a specific object with a different schema?  It seems possible: the specific class contains the reader’s schema, and the generic object contains the writer’s schema and data.  But I don’t see how to do it with Java, without serializing and deserializing.

This would be useful in some RPC servers, enabling low level software to deserialize a generic object, which high level software might subsequently resolve to a specific object.

- John Kristian