You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Karthikeyan Muthukumarasamy <mk...@gmail.com> on 2014/10/08 17:16:29 UTC

Using AvroParquetInputFormat to read files of multiple schemas in a single MR job

Hi,

I need to read multiple AvroParquet files (each written with a different
avro schema) in a single MR job.

The AvroParquetInputFormat has only a static method setAvroReadSchema() for
setting the reader schema.

I tried creating a Union Avro schema (union of the two individual avro
schemas) and setting that as the ReadSchema but turns out that in
AvroParquetInputFormat the top level item in the Avro schema has to be a
Record and not a union.

How can I achieve my usecase? Any suggestions/pointers most appreciated.

Thanks & Regards

MK