You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Lloyd Haris <ll...@gmail.com> on 2016/01/19 19:47:50 UTC

Fwd: How to add new fields to schema and append new records - schema evolution

Hi,

Apologies if this has been asked before.

I've been trying to write a Parquet file using Avro as per Hadoop
Definitive Guide book and it's working okay. I have written my application
in Java and the file is saved on HDFS.

What I really want to do is to learn how schema evolution works and I am
evaluating whether we can do the following with Avro and Parquet.

I want to have a single Parquet file and first write a bunch of records to
it. Then when I receive more data, I hope to append those records to the
same file. First, I don't know if this is possible.

Second thing is that we know our schema will evolve. For example, we might
add new fields to the schema and I am wondering whether it's possible to
add new records with the new schema onto the same file which was originally
written with old schema. What we basically want is to keep "the file" as a
database.

Can somebody please tell me if this is doable and if so could you also give
me some code samples because I couldn't find any example codes where it
appends new records to an existing parquet file using Avro as well as any
examples of how to change the schema and write new records based on new
schema to that file.

Thanks

Lloyd