You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by amit nanda <am...@gmail.com> on 2014/04/01 22:55:26 UTC

Dynamic Schema

I have very dynamic data that i want to write to an avro file. The solution
i have is to store all that data in the memory and then calculate the
schema, and then start the writing.

This causes the files to be smaller in size, because of the memory
limitations.

What i am looking for is that i will start data as and when it is
collected, but how should i compute the schema in this case? Can i change
the schema for an avro file?

Thanks
Amit

Re: Dynamic Schema

Posted by Martin Kleppmann <mk...@linkedin.com>.
Hi Amit,

The Avro data file format requires the writer to know the schema from the start, because all records in the file are then written with the same schema. So there probably isn't an alternative to what you're doing -- to buffer as much as you can in memory, write it out to file when the memory buffer is full, and then start a new file.

You can't change the schema of a data file once it has been written, but you can run a background process which merges several data files together, and writes the result to a new file. You can make the merged file's schema the union of all the input file schemas, or you can write some application-specific code which combines the schemas into one, and evolve all the records into that merged schema. This can be done by streaming through the files -- you don't need to keep all the data in memory.

Martin



On 1 Apr 2014, at 21:55, amit nanda <am...@gmail.com> wrote:
> I have very dynamic data that i want to write to an avro file. The solution i have is to store all that data in the memory and then calculate the schema, and then start the writing. 
> 
> This causes the files to be smaller in size, because of the memory limitations.
> 
> What i am looking for is that i will start data as and when it is collected, but how should i compute the schema in this case? Can i change the schema for an avro file?
> 
> Thanks
> Amit