You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Neil Davudo <ne...@yahoo.com> on 2011/12/06 18:31:48 UTC

schema by reference

Does Avro provide a mechanism to refer to the schema by reference?

Thanks,
Neil

Re: schema by reference

Posted by Doug Cutting <cu...@apache.org>.
Are you talking about RPC?  Earlier you said, "messages would be smaller
in size when we store large numbers of them", which led me to think
you're talking about some sort of data store.

If you're talking about RPC then there's already a reference passed, the
MD5 sum of the protocol text.  The client and/or server could maintain a
persistent database of these so that the text need never be transmitted.
 If that's not appropriate then one could devise a different RPC
mechanism that instead uses, e.g., URLs.  Perhaps these could be
included in the handshake metadata of the existing RPC mechanism, as an
extension.

If you're talking about file-based storage, then Avro's data file format
already factors out the schema.  If you're talking about some other sort
of storage, then I'm not sure what modifications to Avro would be
required to support this.

Doug

On 12/06/2011 12:10 PM, Neil Davudo wrote:
> It would be nice if the Avro has a way for the message to carry the URL of the schema, much like it can carry the schema within it. We could pass it separately out of band (e.g. header) but that reduces the strength of the link between the message and the URL of the schema.
> 
> Any thoughts on supporting this?
> 
> Neil
> 
> ----- Original Message -----
> From: Doug Cutting <cu...@apache.org>
> To: user@avro.apache.org
> Cc: 
> Sent: Tuesday, December 6, 2011 1:48 PM
> Subject: Re: schema by reference
> 
> On 12/06/2011 11:14 AM, Neil Davudo wrote:
>> Yes, by a URL. Messages would be smaller in size when we store large numbers of them, and we can always get the schema using the reference if necessary. Similar to what we can do with WSDL having a reference to the XSD.
> 
> This is a reasonable thing to do.
> 
> A schema can easily be constructed from a URL with:
> 
> Schema.parse(url.openStream())
> 
> although one would probably want a cache in front of this.
> 
> Note that in Avro one one must ensure that the version of the schema at
> the reference does not change, that it is identical to the version used
> to write the datum.  So one should not probably not use a logical URL
> for a datatype like http://me.com/schemas/FooRecord but rather a unique
> ID like http://me.com/schemas/9fd73.
> 
> If you're using a database (e.g., HBase) then you can have a table that
> of schemas, then, in other tables, store values annotated with the key
> of the entry in the schema table.  https://github.com/spullara/havrobase
> is one example of such an approach.
> 
> Or one might use a URL shortener for this, e.g.:
> 
> http://tinyurl.com/8a4rppd
> 
> redirects to
> 
> avro:///?{"type":"record","name":"foo","fields":[]}
> 
> One could then install a URL handler for "avro" URLs that resolves them
> to their query string.
> 
> Doug
> 

Re: schema by reference

Posted by Neil Davudo <ne...@yahoo.com>.
It would be nice if the Avro has a way for the message to carry the URL of the schema, much like it can carry the schema within it. We could pass it separately out of band (e.g. header) but that reduces the strength of the link between the message and the URL of the schema.

Any thoughts on supporting this?

Neil

----- Original Message -----
From: Doug Cutting <cu...@apache.org>
To: user@avro.apache.org
Cc: 
Sent: Tuesday, December 6, 2011 1:48 PM
Subject: Re: schema by reference

On 12/06/2011 11:14 AM, Neil Davudo wrote:
> Yes, by a URL. Messages would be smaller in size when we store large numbers of them, and we can always get the schema using the reference if necessary. Similar to what we can do with WSDL having a reference to the XSD.

This is a reasonable thing to do.

A schema can easily be constructed from a URL with:

Schema.parse(url.openStream())

although one would probably want a cache in front of this.

Note that in Avro one one must ensure that the version of the schema at
the reference does not change, that it is identical to the version used
to write the datum.  So one should not probably not use a logical URL
for a datatype like http://me.com/schemas/FooRecord but rather a unique
ID like http://me.com/schemas/9fd73.

If you're using a database (e.g., HBase) then you can have a table that
of schemas, then, in other tables, store values annotated with the key
of the entry in the schema table.  https://github.com/spullara/havrobase
is one example of such an approach.

Or one might use a URL shortener for this, e.g.:

http://tinyurl.com/8a4rppd

redirects to

avro:///?{"type":"record","name":"foo","fields":[]}

One could then install a URL handler for "avro" URLs that resolves them
to their query string.

Doug


Re: schema by reference

Posted by Doug Cutting <cu...@apache.org>.
On 12/06/2011 11:14 AM, Neil Davudo wrote:
> Yes, by a URL. Messages would be smaller in size when we store large numbers of them, and we can always get the schema using the reference if necessary. Similar to what we can do with WSDL having a reference to the XSD.

This is a reasonable thing to do.

A schema can easily be constructed from a URL with:

Schema.parse(url.openStream())

although one would probably want a cache in front of this.

Note that in Avro one one must ensure that the version of the schema at
the reference does not change, that it is identical to the version used
to write the datum.  So one should not probably not use a logical URL
for a datatype like http://me.com/schemas/FooRecord but rather a unique
ID like http://me.com/schemas/9fd73.

If you're using a database (e.g., HBase) then you can have a table that
of schemas, then, in other tables, store values annotated with the key
of the entry in the schema table.  https://github.com/spullara/havrobase
is one example of such an approach.

Or one might use a URL shortener for this, e.g.:

http://tinyurl.com/8a4rppd

redirects to

avro:///?{"type":"record","name":"foo","fields":[]}

One could then install a URL handler for "avro" URLs that resolves them
to their query string.

Doug

Re: schema by reference

Posted by Neil Davudo <ne...@yahoo.com>.
Yes, by a URL. Messages would be smaller in size when we store large numbers of them, and we can always get the schema using the reference if necessary. Similar to what we can do with WSDL having a reference to the XSD.

Neil


----- Original Message -----
From: Doug Cutting <cu...@apache.org>
To: user@avro.apache.org
Cc: 
Sent: Tuesday, December 6, 2011 12:20 PM
Subject: Re: schema by reference

On 12/06/2011 09:31 AM, Neil Davudo wrote:
> Does Avro provide a mechanism to refer to the schema by reference?

E.g., by URL?  No, nothing like that is built in.

What is the use you have in mind for such a feature?

Doug


Re: schema by reference

Posted by Doug Cutting <cu...@apache.org>.
On 12/06/2011 09:31 AM, Neil Davudo wrote:
> Does Avro provide a mechanism to refer to the schema by reference?

E.g., by URL?  No, nothing like that is built in.

What is the use you have in mind for such a feature?

Doug