You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flume.apache.org by "Kumar, Ashok 6. (Nokia - IN/Bangalore)" <as...@nokia.com> on 2017/07/17 06:24:40 UTC

Avro to Parquet conversion

Hi all ,

I have avro data coming from kafka and I want to convert it into Parquet using flume. I am not sure how to do it. Can anyone help me out in this.

Regards ,
Ashok

Re: Avro to Parquet conversion

Posted by Matt Sicker <bo...@gmail.com>.
I implemented something similar to this recently. What you can do is mount
a tmpfs, batch up GenericRecords, write them to a Parquet file in the
tmpfs, then read it back into a byte[] to do with it as you wish.

On 30 August 2017 at 13:17, Mike Percy <mp...@apache.org> wrote:

> I know that this reply is quite late. I'm not aware of any Flume Parquet
> writer that currently exists. If it was me I would stream it to HDFS in
> Avro format and then use an ETL job (perhaps via Spark or Impala) to
> convert the Avro to Parquet in large batches. Parquet is well suited to
> large batches of records due to its columnar nature.
>
> Mike
>
> On Sun, Jul 16, 2017 at 11:24 PM, Kumar, Ashok 6. (Nokia - IN/Bangalore) <
> ashok.6.kumar@nokia.com> wrote:
>
>> Hi all ,
>>
>>
>>
>> I have avro data coming from kafka and I want to convert it into Parquet
>> using flume. I am not sure how to do it. Can anyone help me out in this.
>>
>>
>>
>> Regards ,
>>
>> Ashok
>>
>
>


-- 
Matt Sicker <bo...@gmail.com>

Re: Avro to Parquet conversion

Posted by Mike Percy <mp...@apache.org>.
I know that this reply is quite late. I'm not aware of any Flume Parquet
writer that currently exists. If it was me I would stream it to HDFS in
Avro format and then use an ETL job (perhaps via Spark or Impala) to
convert the Avro to Parquet in large batches. Parquet is well suited to
large batches of records due to its columnar nature.

Mike

On Sun, Jul 16, 2017 at 11:24 PM, Kumar, Ashok 6. (Nokia - IN/Bangalore) <
ashok.6.kumar@nokia.com> wrote:

> Hi all ,
>
>
>
> I have avro data coming from kafka and I want to convert it into Parquet
> using flume. I am not sure how to do it. Can anyone help me out in this.
>
>
>
> Regards ,
>
> Ashok
>