You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Ivan Tsyba <iv...@gmail.com> on 2022/07/20 08:37:23 UTC

GenericDatumReader writer's schema question

Hello

As stated in Avro Getting Started
<https://avro.apache.org/docs/current/gettingstartedjava.html#Deserializing>
about
deserialization without code generation: "The data will be read using the
writer's schema included in the file, and the reader's schema provided to
the GenericDatumReader". Here is how GenericDatumReader is created in the
example

DatumReader<GenericRecord> datumReader = new
GenericDatumReader<GenericRecord>(schema);

But when you look at this GenericDatumReader constructor Javadoc it states
"Construct where the writer's and reader's schemas are the same." (and
actual code corresponds to this).

So the writer's schema isn’t taken from a serialized file but from a
constructor parameter?

Re: GenericDatumReader writer's schema question

Posted by Ivan Tsyba <iv...@gmail.com>.
Hello Oscar,
Yes, I've looked inside DataFileReader and now it's clear for me
Thank you

пт, 22 лип. 2022 р. о 12:46 Oscar Westra van Holthe - Kind <
oscar@westravanholthe.nl> пише:

> Hi Ivan,
>
> You're correct about the GenericDatumReader javadoc, but the writer
> schema can be adjusted after creation. This is what the DataFileReader
> does.
>
> So after the DataFileReader is initialised, the underlying
> GenericDatumReader uses the the schema in the file as write schema (to
> understand the data), and the schema you provided as read schema (to give
> data to you via dataFileReader.next(user)).
>
> Does that clarify things for you?
>
>
> Kind regards,
> Oscar
>
>
> On Wed, 20 Jul 2022 at 10:37, Ivan Tsyba <iv...@gmail.com> wrote:
>
>> Hello
>>
>> As stated in Avro Getting Started
>> <https://avro.apache.org/docs/current/gettingstartedjava.html#Deserializing> about
>> deserialization without code generation: "The data will be read using the
>> writer's schema included in the file, and the reader's schema provided to
>> the GenericDatumReader". Here is how GenericDatumReader is created in the
>> example
>>
>> DatumReader<GenericRecord> datumReader = new
>> GenericDatumReader<GenericRecord>(schema);
>>
>> But when you look at this GenericDatumReader constructor Javadoc it
>> states "Construct where the writer's and reader's schemas are the same."
>> (and actual code corresponds to this).
>>
>> So the writer's schema isn’t taken from a serialized file but from a
>> constructor parameter?
>>
>
>
> --
>
> ✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>
>
>

Re: GenericDatumReader writer's schema question

Posted by Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>.
Hi Ivan,

You're correct about the GenericDatumReader javadoc, but the writer schema
can be adjusted after creation. This is what the DataFileReader does.

So after the DataFileReader is initialised, the underlying
GenericDatumReader uses the the schema in the file as write schema (to
understand the data), and the schema you provided as read schema (to give
data to you via dataFileReader.next(user)).

Does that clarify things for you?


Kind regards,
Oscar


On Wed, 20 Jul 2022 at 10:37, Ivan Tsyba <iv...@gmail.com> wrote:

> Hello
>
> As stated in Avro Getting Started
> <https://avro.apache.org/docs/current/gettingstartedjava.html#Deserializing> about
> deserialization without code generation: "The data will be read using the
> writer's schema included in the file, and the reader's schema provided to
> the GenericDatumReader". Here is how GenericDatumReader is created in the
> example
>
> DatumReader<GenericRecord> datumReader = new
> GenericDatumReader<GenericRecord>(schema);
>
> But when you look at this GenericDatumReader constructor Javadoc it states
> "Construct where the writer's and reader's schemas are the same." (and
> actual code corresponds to this).
>
> So the writer's schema isn’t taken from a serialized file but from a
> constructor parameter?
>


-- 

✉️ Oscar Westra van Holthe - Kind <os...@westravanholthe.nl>