You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@avro.apache.org by Koert Kuipers <ko...@tresata.com> on 2011/12/02 04:32:41 UTC

reader in hadoop without reader's schema

I am reading from avro container files in hadoop. I know the container
files have a (writers) schema stored in them. My reader specifies it's
schema using avro.input.schema job parameter. This way any schema changes
are gracefully handled with both schema's present.

However, i dont always need all this complexity. Is there a way to read
without having to specify a reader's schema, where i basically say "just
accept the writer's schema and read the data that way".

Re: reader in hadoop without reader's schema

Posted by Doug Cutting <cu...@apache.org>.

On 12/06/2011 07:16 AM, Koert Kuipers wrote:
> What about if I use AvroInputFormat? I tried setting the input schema to
> null but that did not work

Yes, it looks like that would not currently work.  Please file a Jira
issue if you require this.  It should be a simple modification to
AvroRecordReader.java, plus adding a test for it.

Thanks,

Doug

Re: reader in hadoop without reader's schema

Posted by Koert Kuipers <ko...@tresata.com>.

What about if I use AvroInputFormat? I tried setting the input schema to
null but that did not work
On Dec 5, 2011 6:50 PM, "Doug Cutting" <cu...@apache.org> wrote:

> On 12/01/2011 07:32 PM, Koert Kuipers wrote:
> > I am reading from avro container files in hadoop. I know the container
> > files have a (writers) schema stored in them. My reader specifies it's
> > schema using avro.input.schema job parameter. This way any schema
> > changes are gracefully handled with both schema's present.
> >
> > However, i dont always need all this complexity. Is there a way to read
> > without having to specify a reader's schema, where i basically say "just
> > accept the writer's schema and read the data that way".
>
> That's what's done by default if you, e.g., do something like:
>
> Iterable i = DataFileReader.openReader(file, new GenericDatumReader());
> for (Object o : i) {
>  System.out.println(o);
> }
>
> Doug
>

Re: reader in hadoop without reader's schema

Posted by Doug Cutting <cu...@apache.org>.

On 12/01/2011 07:32 PM, Koert Kuipers wrote:
> I am reading from avro container files in hadoop. I know the container
> files have a (writers) schema stored in them. My reader specifies it's
> schema using avro.input.schema job parameter. This way any schema
> changes are gracefully handled with both schema's present.
> 
> However, i dont always need all this complexity. Is there a way to read
> without having to specify a reader's schema, where i basically say "just
> accept the writer's schema and read the data that way".

That's what's done by default if you, e.g., do something like:

Iterable i = DataFileReader.openReader(file, new GenericDatumReader());
for (Object o : i) {
  System.out.println(o);
}

Doug