You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by ALeX Wang <ee...@gmail.com> on 2018/04/20 00:39:53 UTC

Re: Question about my use case.

Sorry for this long delayed reply,

Finally have time to work on this again, and yes, after taking a closer
study at parquet-hadoop source code, I'm able to simple write a customer
ParquetWriter using java.io.FileOutputStream for my use case.

We do not use Avro,  All the data is in flat java classes, and we want to
directly write into parquet file at local filesystem,

Thanks,
Alex Wang,

Re: Question about my use case.

Posted by Ryan Blue <rb...@netflix.com.INVALID>.
You might consider using Avro with Java classes. That would reduce the
amount of code you need because it can use reflection to work with your
classes. We don’t recommend building to the object model APIs unless you
need tighter integration with an existing processing engine. Here’s an
example of how easy it is to write from Parquet’s tests:

    Schema schema = ReflectData.get().getSchema(Pojo.class);
    ParquetWriter<Pojo> writer = AvroParquetWriter.<Pojo>builder(path)
        .withSchema(schema)
        .withDataModel(ReflectData.get())
        .build();
    for (int i = 0; i < num; i++) {
      writer.write(records.get(i));
    }

There’s also an example of the read side
<https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestReflectReadWrite.java#L47-L59>
in the tests. That’s probably easier to use and maintain.

rb
​

On Thu, Apr 19, 2018 at 5:39 PM, ALeX Wang <ee...@gmail.com> wrote:

> Sorry for this long delayed reply,
>
> Finally have time to work on this again, and yes, after taking a closer
> study at parquet-hadoop source code, I'm able to simple write a customer
> ParquetWriter using java.io.FileOutputStream for my use case.
>
> We do not use Avro,  All the data is in flat java classes, and we want to
> directly write into parquet file at local filesystem,
>
> Thanks,
> Alex Wang,
>



-- 
Ryan Blue
Software Engineer
Netflix