You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by rohithkumars <ro...@gmail.com> on 2017/06/28 08:39:15 UTC

conversion from AVRO file format to Parquet file format

Hello Team,

We the team are in need to convert the data flow file from AVRO to PARQUET. 

We found two processors to do that.

1. Get_Parquet 
2. Put_Parquet

is it possible to convert binary file format of AVRO to parquet. The source
schema is not getting automatically generated for parquet.

instead we also tried manually creating the schema and to write as a parquet
but no luck.

Please guide us on this.

Thanks,
Rohith



--
View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/conversion-from-AVRO-file-format-to-Parquet-file-format-tp16278.html
Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.

Re: conversion from AVRO file format to Parquet file format

Posted by Bryan Bende <bb...@gmail.com>.
Rohith,

Can you share more details about how you have configured PutParquet?
What Record Reader are you using and what Schema Access Strategy?

If your data is already in Avro then you would need to set the Record
Reader to an AvroRecordReader. The AvroRecordReader can be configured
to use the schema from the Avro datafile, or from a schema registry.

Then you have to configure the write schema directly in PutParquet
through the 'Schema Access Strategy'. In the future there will be an
option to just write with the same schema as the reader, but currently
the read and write schemas are separate.

The easiest thing to do is probably to create an AvroSchemaRegistry
and add your schema to it, then have the AvroRecordReader reference
this by name, and also have the PutParquet reference it by name, this
way they are ensured to use the same schema.

Let us know if this does not make sense.

Thanks,

Bryan


On Wed, Jun 28, 2017 at 4:39 AM, rohithkumars <ro...@gmail.com> wrote:
> Hello Team,
>
> We the team are in need to convert the data flow file from AVRO to PARQUET.
>
> We found two processors to do that.
>
> 1. Get_Parquet
> 2. Put_Parquet
>
> is it possible to convert binary file format of AVRO to parquet. The source
> schema is not getting automatically generated for parquet.
>
> instead we also tried manually creating the schema and to write as a parquet
> but no luck.
>
> Please guide us on this.
>
> Thanks,
> Rohith
>
>
>
> --
> View this message in context: http://apache-nifi-developer-list.39713.n7.nabble.com/conversion-from-AVRO-file-format-to-Parquet-file-format-tp16278.html
> Sent from the Apache NiFi Developer List mailing list archive at Nabble.com.