You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Kun Liu <li...@apache.org> on 2023/01/04 08:58:21 UTC

[QUESTION][Parquet][Decimal] Why not implement the INT32/INT64 to store Decimal logical type in parquet file

Hi all,
   In the PR https://github.com/apache/arrow-rs/pull/3431, I want to write
decimal data with lower precision to INT32/INT64 in the parquet file.

   The document of arrow c++ about  Reading and writing Parquet files
<https://arrow.apache.org/docs/cpp/parquet.html#logical-types> requires
`(2) On the write side, a FIXED_LENGTH_BYTE_ARRAY is always emitted.`

   But in the definition of parquet format
<https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal>
for
logical type of decimal, the decimal type can be represented by INT32/INT64
for the lower precision.

   Why we not follow the definition of parquet
<https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal>
for
writing the parquet file?

Thanks
Kun

Re: [QUESTION][Parquet][Decimal] Why not implement the INT32/INT64 to store Decimal logical type in parquet file

Posted by Gang Wu <ga...@apache.org>.
I have created an issue and will work on it: [C++][Parquet] Parquet writer
supports writing int32/int64 for decimal type · Issue #15239 · apache/arrow
(github.com) <https://github.com/apache/arrow/issues/15239>

Best,
Gang


On Sat, Jan 7, 2023 at 1:39 AM Micah Kornfield <em...@gmail.com>
wrote:

> >
> > Hi Kun,
> > The document of arrow c++ about  Reading and writing Parquet files
> > <https://arrow.apache.org/docs/cpp/parquet.html#logical-types> requires
> > `(2) On the write side, a FIXED_LENGTH_BYTE_ARRAY is always emitted.`
>
> I don't think this is a requirement, it is simply documenting current
> behavior.
>
>    Why we not follow the definition of parquet
> > <
> >
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
> > >
> > for
> > writing the parquet file?
>
>
> I think this was probably an issue of effort.  Given FLBA is more generic,
> there wasn't a need to write to other types.  Contributing an option to
> write out lower precision types to integers would be useful.
>
> Thanks,
> Micah
>
>
> On Wed, Jan 4, 2023 at 12:58 AM Kun Liu <li...@apache.org> wrote:
>
> > Hi all,
> >    In the PR https://github.com/apache/arrow-rs/pull/3431, I want to
> write
> > decimal data with lower precision to INT32/INT64 in the parquet file.
> >
> >    The document of arrow c++ about  Reading and writing Parquet files
> > <https://arrow.apache.org/docs/cpp/parquet.html#logical-types> requires
> > `(2) On the write side, a FIXED_LENGTH_BYTE_ARRAY is always emitted.`
> >
> >    But in the definition of parquet format
> > <
> >
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
> > >
> > for
> > logical type of decimal, the decimal type can be represented by
> INT32/INT64
> > for the lower precision.
> >
> >    Why we not follow the definition of parquet
> > <
> >
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
> > >
> > for
> > writing the parquet file?
> >
> > Thanks
> > Kun
> >
>

Re: [QUESTION][Parquet][Decimal] Why not implement the INT32/INT64 to store Decimal logical type in parquet file

Posted by Micah Kornfield <em...@gmail.com>.
>
> Hi Kun,
> The document of arrow c++ about  Reading and writing Parquet files
> <https://arrow.apache.org/docs/cpp/parquet.html#logical-types> requires
> `(2) On the write side, a FIXED_LENGTH_BYTE_ARRAY is always emitted.`

I don't think this is a requirement, it is simply documenting current
behavior.

   Why we not follow the definition of parquet
> <
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
> >
> for
> writing the parquet file?


I think this was probably an issue of effort.  Given FLBA is more generic,
there wasn't a need to write to other types.  Contributing an option to
write out lower precision types to integers would be useful.

Thanks,
Micah


On Wed, Jan 4, 2023 at 12:58 AM Kun Liu <li...@apache.org> wrote:

> Hi all,
>    In the PR https://github.com/apache/arrow-rs/pull/3431, I want to write
> decimal data with lower precision to INT32/INT64 in the parquet file.
>
>    The document of arrow c++ about  Reading and writing Parquet files
> <https://arrow.apache.org/docs/cpp/parquet.html#logical-types> requires
> `(2) On the write side, a FIXED_LENGTH_BYTE_ARRAY is always emitted.`
>
>    But in the definition of parquet format
> <
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
> >
> for
> logical type of decimal, the decimal type can be represented by INT32/INT64
> for the lower precision.
>
>    Why we not follow the definition of parquet
> <
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal
> >
> for
> writing the parquet file?
>
> Thanks
> Kun
>