You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@iceberg.apache.org by Star P <st...@gmail.com> on 2021/08/26 20:23:54 UTC

Matching iceberg data types to Parquet data types

Hello Iceberg devs!

I was looking through the Parquet Logical types
<https://github.com/apache/parquet-format/blob/master/LogicalTypes.md>and
Iceberg data types and I have a couple of questions.

1. Parquet allows storing nano second precision data. If I convert an
existing Parquet file with nanosecond precision data to Iceberg, what will
be the data type of the column in Iceberg?
2. Parquet allows decimals of arbitrary precision, so a parquet file may
have a FLBA of length > 16 representing a decimal. How is this handled in
Iceberg?

Thanks!

Re: Matching iceberg data types to Parquet data types

Posted by Zoltán Borók-Nagy <bo...@cloudera.com.INVALID>.
Hi,

You can find information of type mappings here:
https://iceberg.apache.org/spec/#parquet

1. Iceberg timestamps have microseconds precision. In Parquet they are
stored as INT64s with TIMESTAMP_MICROS annotation.
2. Iceberg limits decimal precision to 38:
https://iceberg.apache.org/spec/#primitive-types

Doing anything non-standard (using different Parquet types to store
timestamps, having decimals with precision > 38) is not guaranteed to work.
It might work with one implementation while not working with other
implementations. E.g. Impala also limits decimal precision to 38.

Cheers,
    Zoltan


On Thu, Aug 26, 2021 at 10:24 PM Star P <st...@gmail.com> wrote:

> Hello Iceberg devs!
>
> I was looking through the Parquet Logical types
> <https://github.com/apache/parquet-format/blob/master/LogicalTypes.md>and
> Iceberg data types and I have a couple of questions.
>
> 1. Parquet allows storing nano second precision data. If I convert an
> existing Parquet file with nanosecond precision data to Iceberg, what will
> be the data type of the column in Iceberg?
> 2. Parquet allows decimals of arbitrary precision, so a parquet file may
> have a FLBA of length > 16 representing a decimal. How is this handled in
> Iceberg?
>
> Thanks!
>