You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Tomas Svarovsky <sv...@gmail.com> on 2014/02/16 00:22:03 UTC

Avro schema in Ruby API

Hey,

I wanted to ask couple of questions.

1) Let's assume I have 2 avro files. I would like to grab schemas of both.
Compare them and decide what to do. The only way I found to get to the
schema in a reader is through

dr = Avro::DataFile::Reader.new(file, Avro::IO::DatumReader.new)
dr.meta

and that is still stringified JSON. Is this the only way or even is this
use case something supported or should I do it differently?

2) Also is ti possible to read just the schema? Sometimes it is useful to
look at a file without actually reading the whole file let's say from s3.

Regards Tomas

Re: Avro schema in Ruby API

Posted by Tomas Svarovsky <sv...@gmail.com>.
Hey Harsh,

thanks. I can confirm that the first one works. Let me try the second one.

Tomas


On Sun, Feb 16, 2014 at 8:07 AM, Harsh J <ha...@cloudera.com> wrote:

> Hi,
>
> For (1) I believe you could do a "Schema.parse meta['avro.schema']" to
> obtain the schema as an object from the meta entry of the file.
>
> For (2), as defined in the spec at
> http://avro.apache.org/docs/current/spec.html#Object+Container+Files,
> since the schema is stored only in the header of the file, using a
> simple initialised reader will be efficient in reading just that. The
> file's data blocks are read only upon enumerating over the reader.
>
> On Sun, Feb 16, 2014 at 4:52 AM, Tomas Svarovsky
> <sv...@gmail.com> wrote:
> > Hey,
> >
> > I wanted to ask couple of questions.
> >
> > 1) Let's assume I have 2 avro files. I would like to grab schemas of
> both.
> > Compare them and decide what to do. The only way I found to get to the
> > schema in a reader is through
> >
> > dr = Avro::DataFile::Reader.new(file, Avro::IO::DatumReader.new)
> > dr.meta
> >
> > and that is still stringified JSON. Is this the only way or even is this
> use
> > case something supported or should I do it differently?
> >
> > 2) Also is ti possible to read just the schema? Sometimes it is useful to
> > look at a file without actually reading the whole file let's say from s3.
> >
> > Regards Tomas
>
>
>
> --
> Harsh J
>

Re: Avro schema in Ruby API

Posted by Harsh J <ha...@cloudera.com>.
Hi,

For (1) I believe you could do a "Schema.parse meta['avro.schema']" to
obtain the schema as an object from the meta entry of the file.

For (2), as defined in the spec at
http://avro.apache.org/docs/current/spec.html#Object+Container+Files,
since the schema is stored only in the header of the file, using a
simple initialised reader will be efficient in reading just that. The
file's data blocks are read only upon enumerating over the reader.

On Sun, Feb 16, 2014 at 4:52 AM, Tomas Svarovsky
<sv...@gmail.com> wrote:
> Hey,
>
> I wanted to ask couple of questions.
>
> 1) Let's assume I have 2 avro files. I would like to grab schemas of both.
> Compare them and decide what to do. The only way I found to get to the
> schema in a reader is through
>
> dr = Avro::DataFile::Reader.new(file, Avro::IO::DatumReader.new)
> dr.meta
>
> and that is still stringified JSON. Is this the only way or even is this use
> case something supported or should I do it differently?
>
> 2) Also is ti possible to read just the schema? Sometimes it is useful to
> look at a file without actually reading the whole file let's say from s3.
>
> Regards Tomas



-- 
Harsh J