You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by Arup Malakar <am...@gmail.com> on 2019/05/02 02:32:41 UTC

Writing INT96 timestamp in parquet from either avro/protobuf records

Hi parquet-dev,

We have existing parquet files which were generated from json using hive,
where timestamps live as INT96. We are changing the pipeline where we are
planning to use flink to generate parquet files from protobuf (or avro)
using flink's StreamingFileSink. But from my research I am unable to find a
way to write INT96 columns in the parquet either from avro or protobuf. We
would like to keep the same datatype on disk for historical and new data so
would like to stick to INT96, any suggestion how to achieve that?

-- 
Arup Malakar

Re: Writing INT96 timestamp in parquet from either avro/protobuf records

Posted by Julien Le Dem <ju...@wework.com.INVALID>.
Hi Arup,
You are correct, you would have to use the lower level APIs or contribute
the int96 support to either protobuf or avro integrations.
However we are recommending users to migrate away from the int96 type so I
would not recommend adding that support.
https://issues.apache.org/jira/browse/PARQUET-323
Maybe check how the tools you use to query that data interpret int96 and
int64, you might have a better solution moving to the new type and it being
compatible.

On Fri, May 3, 2019 at 11:34 AM Arup Malakar <am...@gmail.com> wrote:

> Following up on the thread, my current understanding is that INT96 is not a
> native type in either of protobuf/avro, so the corresponding high level
> parquet writers don’t support that. But `INT96` is supported by low level
> parquet writer apis. I was able to generate parquet files with INT96 using
> examples from:
>
> https://stackoverflow.com/questions/54657496/how-to-write-timestamp-logical-type-int96-to-parquet-using-parquetwriter
>
> Arup
>
> On Wed, May 1, 2019 at 7:32 PM Arup Malakar <am...@gmail.com> wrote:
>
> > Hi parquet-dev,
> >
> > We have existing parquet files which were generated from json using hive,
> > where timestamps live as INT96. We are changing the pipeline where we are
> > planning to use flink to generate parquet files from protobuf (or avro)
> > using flink's StreamingFileSink. But from my research I am unable to
> find a
> > way to write INT96 columns in the parquet either from avro or protobuf.
> We
> > would like to keep the same datatype on disk for historical and new data
> so
> > would like to stick to INT96, any suggestion how to achieve that?
> >
> > --
> > Arup Malakar
> >
>
>
> --
> Arup Malakar
>

Re: Writing INT96 timestamp in parquet from either avro/protobuf records

Posted by Arup Malakar <am...@gmail.com>.
Following up on the thread, my current understanding is that INT96 is not a
native type in either of protobuf/avro, so the corresponding high level
parquet writers don’t support that. But `INT96` is supported by low level
parquet writer apis. I was able to generate parquet files with INT96 using
examples from:
https://stackoverflow.com/questions/54657496/how-to-write-timestamp-logical-type-int96-to-parquet-using-parquetwriter

Arup

On Wed, May 1, 2019 at 7:32 PM Arup Malakar <am...@gmail.com> wrote:

> Hi parquet-dev,
>
> We have existing parquet files which were generated from json using hive,
> where timestamps live as INT96. We are changing the pipeline where we are
> planning to use flink to generate parquet files from protobuf (or avro)
> using flink's StreamingFileSink. But from my research I am unable to find a
> way to write INT96 columns in the parquet either from avro or protobuf. We
> would like to keep the same datatype on disk for historical and new data so
> would like to stick to INT96, any suggestion how to achieve that?
>
> --
> Arup Malakar
>


-- 
Arup Malakar