You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2020/08/04 04:30:22 UTC

Re: [DISCUSS] Support of higher bit-width Decimal type

Given no objections, we'll go ahead and start implementing support for
256-bit decimals.

I'm considering setting up another branch to develop all the components so
they can be merged to master atomically.

Thanks,
Micah

On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com> wrote:

> Generally this sounds fine to me. At some point it would be good to
> add 32-bit and 64-bit decimal support but this can be done in the
> future.
>
> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com> wrote:
> >
> > Hi Micah,
> >
> > Thanks for opening the discussion.
> > I am aware of some scenarios where decimal requires more than 16 bytes,
> so
> > I think it would be beneficial to support this in Arrow.
> >
> > Best,
> > Liya Fan
> >
> >
> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> > > Hi Arrow Dev,
> > > ZetaSQL (Google's open source standard SQL library) recently
> introduced a
> > > BigNumeric [1] type which requires a 256 bit width to properly support
> it.
> > > I'd like to add support (possibly in collaboration with some of my
> > > colleagues) to add support for 256 bit width Decimals in Arrow to
> support a
> > > type corresponding to BigNumeric.
> > >
> > > In past discussions on this, I don't think we established a minimum
> bar for
> > > supporting additional bit-widths within Arrow.
> > >
> > > I'd like to propose the following requirements:
> > > 1.  A vote agreeing on adding support for a new bitwidth (we can
> discuss
> > > any objections here).
> > > 2.  Support in Java and C++ for integration tests verifying the
> ability to
> > > round-trip the value.
> > > 3.  Support in Java for conversion to/from BigDecimal [2]
> > > 4.  Support in Python converting to/from Decimal [3]
> > >
> > > Is there anything else that people feel like is a requirement for basic
> > > support of an additional bit width for Decimal's?
> > >
> > > Thanks,
> > > Micah
> > >
> > >
> > > [1]
> > >
> > >
> https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
> > > [2]
> https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
> > > [3] https://docs.python.org/3/library/decimal.html
> > >
>

Re: [DISCUSS] Support of higher bit-width Decimal type

Posted by Wes McKinney <we...@gmail.com>.
hi Micah -- this makes sense to me

On Fri, Sep 25, 2020 at 10:56 PM Micah Kornfield <em...@gmail.com> wrote:
>
> The decimal256 branch now contains sufficient implementations in Java and C++ to pass round trip integration tests.  Some of python interop is missing (but actively being worked on).
>
> I'll plan on creating PR to update the specification with a corresponding vote over the next couple of days.
>
> One thing Antoine brought up was if it makes sense to try to merge the contents of Decimal256 sooner rather than later to master to avoid accumulating a much larger PR.
>
> Thoughts?
>
> Thanks,
> Micah
>
> On Sat, Aug 15, 2020 at 8:48 AM Wes McKinney <we...@gmail.com> wrote:
>>
>> On Fri, Aug 14, 2020 at 11:17 PM Micah Kornfield <em...@gmail.com> wrote:
>> >
>> > Hi Jacques,
>> >
>> > Do we have a good definition of what is necessary to add a new data type?
>> > > Adding a type but not pulling it through most of the code seems less than
>> > > ideal since it means one part of Arrow doesn't work with another (providing
>> > > a less optimal end-user experience).
>> >
>> > I think what I proposed below is a minimum viable integration plan (and
>> > covers previously discussed requirements for new types). It demonstrates
>> > interop between two reference implementations and allows conversion to/from
>> > idiomatic language analogues.  So it covers the basic goal of "arrow
>> > interop".
>> >
>> >
>> > For example, would this work include making Gandiva and all the kernels
>> > > support this new data type where appropriate?
>> >
>> > Not initially.  There needs to be a stepping stone to start supporting new
>> > types. I don't think it is feasible to try to land all of this
>> > functionality in one PR.  I'll lend a hand at trying get support for
>> > built-in compute after we get the first part done.
>>
>> Since (I think?) there are other data types that Gandiva already does
>> not support, trying to use decimal256 data with Gandiva would raise
>> the same exception that it would raise with an unsupported type.
>> Another option would be to insert an implicit cast to decimal128 as a
>> stopgap.
>>
>> > Thanks,
>> > Micah
>> >
>> >
>> >
>> > On Fri, Aug 14, 2020 at 5:08 PM Jacques Nadeau <ja...@apache.org> wrote:
>> >
>> > > Do we have a good definition of what is necessary to add a new data type?
>> > > Adding a type but not pulling it through most of the code seems less than
>> > > ideal since it means one part of Arrow doesn't work with another (providing
>> > > a less optimal end-user experience).
>> > >
>> > > For example, would this work include making Gandiva and all the kernels
>> > > support this new data type where appropriate?
>> > >
>> > > On Wed, Aug 5, 2020 at 12:01 PM Wes McKinney <we...@gmail.com> wrote:
>> > >
>> > > > Sounds fine to me. I guess one question is what needs to be formalized
>> > > > in the Schema.fbs files or elsewhere in the columnar format
>> > > > documentation (and we will need to hold an associated vote for that I
>> > > > think)
>> > > >
>> > > > On Mon, Aug 3, 2020 at 11:30 PM Micah Kornfield <em...@gmail.com>
>> > > > wrote:
>> > > > >
>> > > > > Given no objections, we'll go ahead and start implementing support for
>> > > > 256-bit decimals.
>> > > > >
>> > > > > I'm considering setting up another branch to develop all the components
>> > > > so they can be merged to master atomically.
>> > > > >
>> > > > > Thanks,
>> > > > > Micah
>> > > > >
>> > > > > On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com>
>> > > > wrote:
>> > > > >>
>> > > > >> Generally this sounds fine to me. At some point it would be good to
>> > > > >> add 32-bit and 64-bit decimal support but this can be done in the
>> > > > >> future.
>> > > > >>
>> > > > >> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com>
>> > > wrote:
>> > > > >> >
>> > > > >> > Hi Micah,
>> > > > >> >
>> > > > >> > Thanks for opening the discussion.
>> > > > >> > I am aware of some scenarios where decimal requires more than 16
>> > > > bytes, so
>> > > > >> > I think it would be beneficial to support this in Arrow.
>> > > > >> >
>> > > > >> > Best,
>> > > > >> > Liya Fan
>> > > > >> >
>> > > > >> >
>> > > > >> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <
>> > > > emkornfield@gmail.com>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> > > Hi Arrow Dev,
>> > > > >> > > ZetaSQL (Google's open source standard SQL library) recently
>> > > > introduced a
>> > > > >> > > BigNumeric [1] type which requires a 256 bit width to properly
>> > > > support it.
>> > > > >> > > I'd like to add support (possibly in collaboration with some of my
>> > > > >> > > colleagues) to add support for 256 bit width Decimals in Arrow to
>> > > > support a
>> > > > >> > > type corresponding to BigNumeric.
>> > > > >> > >
>> > > > >> > > In past discussions on this, I don't think we established a
>> > > minimum
>> > > > bar for
>> > > > >> > > supporting additional bit-widths within Arrow.
>> > > > >> > >
>> > > > >> > > I'd like to propose the following requirements:
>> > > > >> > > 1.  A vote agreeing on adding support for a new bitwidth (we can
>> > > > discuss
>> > > > >> > > any objections here).
>> > > > >> > > 2.  Support in Java and C++ for integration tests verifying the
>> > > > ability to
>> > > > >> > > round-trip the value.
>> > > > >> > > 3.  Support in Java for conversion to/from BigDecimal [2]
>> > > > >> > > 4.  Support in Python converting to/from Decimal [3]
>> > > > >> > >
>> > > > >> > > Is there anything else that people feel like is a requirement for
>> > > > basic
>> > > > >> > > support of an additional bit width for Decimal's?
>> > > > >> > >
>> > > > >> > > Thanks,
>> > > > >> > > Micah
>> > > > >> > >
>> > > > >> > >
>> > > > >> > > [1]
>> > > > >> > >
>> > > > >> > >
>> > > >
>> > > https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
>> > > > >> > > [2]
>> > > > https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
>> > > > >> > > [3] https://docs.python.org/3/library/decimal.html
>> > > > >> > >
>> > > >
>> > >

Re: [DISCUSS] Support of higher bit-width Decimal type

Posted by Micah Kornfield <em...@gmail.com>.
The decimal256 branch now contains sufficient implementations in Java and
C++ to pass round trip integration tests.  Some of python interop is
missing (but actively being worked on).

I'll plan on creating PR to update the specification with a corresponding
vote over the next couple of days.

One thing Antoine brought up was if it makes sense to try to merge the
contents of Decimal256 sooner rather than later to master to avoid
accumulating a much larger PR.

Thoughts?

Thanks,
Micah

On Sat, Aug 15, 2020 at 8:48 AM Wes McKinney <we...@gmail.com> wrote:

> On Fri, Aug 14, 2020 at 11:17 PM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Hi Jacques,
> >
> > Do we have a good definition of what is necessary to add a new data type?
> > > Adding a type but not pulling it through most of the code seems less
> than
> > > ideal since it means one part of Arrow doesn't work with another
> (providing
> > > a less optimal end-user experience).
> >
> > I think what I proposed below is a minimum viable integration plan (and
> > covers previously discussed requirements for new types). It demonstrates
> > interop between two reference implementations and allows conversion
> to/from
> > idiomatic language analogues.  So it covers the basic goal of "arrow
> > interop".
> >
> >
> > For example, would this work include making Gandiva and all the kernels
> > > support this new data type where appropriate?
> >
> > Not initially.  There needs to be a stepping stone to start supporting
> new
> > types. I don't think it is feasible to try to land all of this
> > functionality in one PR.  I'll lend a hand at trying get support for
> > built-in compute after we get the first part done.
>
> Since (I think?) there are other data types that Gandiva already does
> not support, trying to use decimal256 data with Gandiva would raise
> the same exception that it would raise with an unsupported type.
> Another option would be to insert an implicit cast to decimal128 as a
> stopgap.
>
> > Thanks,
> > Micah
> >
> >
> >
> > On Fri, Aug 14, 2020 at 5:08 PM Jacques Nadeau <ja...@apache.org>
> wrote:
> >
> > > Do we have a good definition of what is necessary to add a new data
> type?
> > > Adding a type but not pulling it through most of the code seems less
> than
> > > ideal since it means one part of Arrow doesn't work with another
> (providing
> > > a less optimal end-user experience).
> > >
> > > For example, would this work include making Gandiva and all the kernels
> > > support this new data type where appropriate?
> > >
> > > On Wed, Aug 5, 2020 at 12:01 PM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > > Sounds fine to me. I guess one question is what needs to be
> formalized
> > > > in the Schema.fbs files or elsewhere in the columnar format
> > > > documentation (and we will need to hold an associated vote for that I
> > > > think)
> > > >
> > > > On Mon, Aug 3, 2020 at 11:30 PM Micah Kornfield <
> emkornfield@gmail.com>
> > > > wrote:
> > > > >
> > > > > Given no objections, we'll go ahead and start implementing support
> for
> > > > 256-bit decimals.
> > > > >
> > > > > I'm considering setting up another branch to develop all the
> components
> > > > so they can be merged to master atomically.
> > > > >
> > > > > Thanks,
> > > > > Micah
> > > > >
> > > > > On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com>
> > > > wrote:
> > > > >>
> > > > >> Generally this sounds fine to me. At some point it would be good
> to
> > > > >> add 32-bit and 64-bit decimal support but this can be done in the
> > > > >> future.
> > > > >>
> > > > >> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com>
> > > wrote:
> > > > >> >
> > > > >> > Hi Micah,
> > > > >> >
> > > > >> > Thanks for opening the discussion.
> > > > >> > I am aware of some scenarios where decimal requires more than 16
> > > > bytes, so
> > > > >> > I think it would be beneficial to support this in Arrow.
> > > > >> >
> > > > >> > Best,
> > > > >> > Liya Fan
> > > > >> >
> > > > >> >
> > > > >> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <
> > > > emkornfield@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Arrow Dev,
> > > > >> > > ZetaSQL (Google's open source standard SQL library) recently
> > > > introduced a
> > > > >> > > BigNumeric [1] type which requires a 256 bit width to properly
> > > > support it.
> > > > >> > > I'd like to add support (possibly in collaboration with some
> of my
> > > > >> > > colleagues) to add support for 256 bit width Decimals in
> Arrow to
> > > > support a
> > > > >> > > type corresponding to BigNumeric.
> > > > >> > >
> > > > >> > > In past discussions on this, I don't think we established a
> > > minimum
> > > > bar for
> > > > >> > > supporting additional bit-widths within Arrow.
> > > > >> > >
> > > > >> > > I'd like to propose the following requirements:
> > > > >> > > 1.  A vote agreeing on adding support for a new bitwidth (we
> can
> > > > discuss
> > > > >> > > any objections here).
> > > > >> > > 2.  Support in Java and C++ for integration tests verifying
> the
> > > > ability to
> > > > >> > > round-trip the value.
> > > > >> > > 3.  Support in Java for conversion to/from BigDecimal [2]
> > > > >> > > 4.  Support in Python converting to/from Decimal [3]
> > > > >> > >
> > > > >> > > Is there anything else that people feel like is a requirement
> for
> > > > basic
> > > > >> > > support of an additional bit width for Decimal's?
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Micah
> > > > >> > >
> > > > >> > >
> > > > >> > > [1]
> > > > >> > >
> > > > >> > >
> > > >
> > >
> https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
> > > > >> > > [2]
> > > > https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
> > > > >> > > [3] https://docs.python.org/3/library/decimal.html
> > > > >> > >
> > > >
> > >
>

Re: [DISCUSS] Support of higher bit-width Decimal type

Posted by Wes McKinney <we...@gmail.com>.
On Fri, Aug 14, 2020 at 11:17 PM Micah Kornfield <em...@gmail.com> wrote:
>
> Hi Jacques,
>
> Do we have a good definition of what is necessary to add a new data type?
> > Adding a type but not pulling it through most of the code seems less than
> > ideal since it means one part of Arrow doesn't work with another (providing
> > a less optimal end-user experience).
>
> I think what I proposed below is a minimum viable integration plan (and
> covers previously discussed requirements for new types). It demonstrates
> interop between two reference implementations and allows conversion to/from
> idiomatic language analogues.  So it covers the basic goal of "arrow
> interop".
>
>
> For example, would this work include making Gandiva and all the kernels
> > support this new data type where appropriate?
>
> Not initially.  There needs to be a stepping stone to start supporting new
> types. I don't think it is feasible to try to land all of this
> functionality in one PR.  I'll lend a hand at trying get support for
> built-in compute after we get the first part done.

Since (I think?) there are other data types that Gandiva already does
not support, trying to use decimal256 data with Gandiva would raise
the same exception that it would raise with an unsupported type.
Another option would be to insert an implicit cast to decimal128 as a
stopgap.

> Thanks,
> Micah
>
>
>
> On Fri, Aug 14, 2020 at 5:08 PM Jacques Nadeau <ja...@apache.org> wrote:
>
> > Do we have a good definition of what is necessary to add a new data type?
> > Adding a type but not pulling it through most of the code seems less than
> > ideal since it means one part of Arrow doesn't work with another (providing
> > a less optimal end-user experience).
> >
> > For example, would this work include making Gandiva and all the kernels
> > support this new data type where appropriate?
> >
> > On Wed, Aug 5, 2020 at 12:01 PM Wes McKinney <we...@gmail.com> wrote:
> >
> > > Sounds fine to me. I guess one question is what needs to be formalized
> > > in the Schema.fbs files or elsewhere in the columnar format
> > > documentation (and we will need to hold an associated vote for that I
> > > think)
> > >
> > > On Mon, Aug 3, 2020 at 11:30 PM Micah Kornfield <em...@gmail.com>
> > > wrote:
> > > >
> > > > Given no objections, we'll go ahead and start implementing support for
> > > 256-bit decimals.
> > > >
> > > > I'm considering setting up another branch to develop all the components
> > > so they can be merged to master atomically.
> > > >
> > > > Thanks,
> > > > Micah
> > > >
> > > > On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com>
> > > wrote:
> > > >>
> > > >> Generally this sounds fine to me. At some point it would be good to
> > > >> add 32-bit and 64-bit decimal support but this can be done in the
> > > >> future.
> > > >>
> > > >> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com>
> > wrote:
> > > >> >
> > > >> > Hi Micah,
> > > >> >
> > > >> > Thanks for opening the discussion.
> > > >> > I am aware of some scenarios where decimal requires more than 16
> > > bytes, so
> > > >> > I think it would be beneficial to support this in Arrow.
> > > >> >
> > > >> > Best,
> > > >> > Liya Fan
> > > >> >
> > > >> >
> > > >> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <
> > > emkornfield@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Arrow Dev,
> > > >> > > ZetaSQL (Google's open source standard SQL library) recently
> > > introduced a
> > > >> > > BigNumeric [1] type which requires a 256 bit width to properly
> > > support it.
> > > >> > > I'd like to add support (possibly in collaboration with some of my
> > > >> > > colleagues) to add support for 256 bit width Decimals in Arrow to
> > > support a
> > > >> > > type corresponding to BigNumeric.
> > > >> > >
> > > >> > > In past discussions on this, I don't think we established a
> > minimum
> > > bar for
> > > >> > > supporting additional bit-widths within Arrow.
> > > >> > >
> > > >> > > I'd like to propose the following requirements:
> > > >> > > 1.  A vote agreeing on adding support for a new bitwidth (we can
> > > discuss
> > > >> > > any objections here).
> > > >> > > 2.  Support in Java and C++ for integration tests verifying the
> > > ability to
> > > >> > > round-trip the value.
> > > >> > > 3.  Support in Java for conversion to/from BigDecimal [2]
> > > >> > > 4.  Support in Python converting to/from Decimal [3]
> > > >> > >
> > > >> > > Is there anything else that people feel like is a requirement for
> > > basic
> > > >> > > support of an additional bit width for Decimal's?
> > > >> > >
> > > >> > > Thanks,
> > > >> > > Micah
> > > >> > >
> > > >> > >
> > > >> > > [1]
> > > >> > >
> > > >> > >
> > >
> > https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
> > > >> > > [2]
> > > https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
> > > >> > > [3] https://docs.python.org/3/library/decimal.html
> > > >> > >
> > >
> >

Re: [DISCUSS] Support of higher bit-width Decimal type

Posted by Micah Kornfield <em...@gmail.com>.
Hi Jacques,

Do we have a good definition of what is necessary to add a new data type?
> Adding a type but not pulling it through most of the code seems less than
> ideal since it means one part of Arrow doesn't work with another (providing
> a less optimal end-user experience).

I think what I proposed below is a minimum viable integration plan (and
covers previously discussed requirements for new types). It demonstrates
interop between two reference implementations and allows conversion to/from
idiomatic language analogues.  So it covers the basic goal of "arrow
interop".


For example, would this work include making Gandiva and all the kernels
> support this new data type where appropriate?

Not initially.  There needs to be a stepping stone to start supporting new
types. I don't think it is feasible to try to land all of this
functionality in one PR.  I'll lend a hand at trying get support for
built-in compute after we get the first part done.

Thanks,
Micah



On Fri, Aug 14, 2020 at 5:08 PM Jacques Nadeau <ja...@apache.org> wrote:

> Do we have a good definition of what is necessary to add a new data type?
> Adding a type but not pulling it through most of the code seems less than
> ideal since it means one part of Arrow doesn't work with another (providing
> a less optimal end-user experience).
>
> For example, would this work include making Gandiva and all the kernels
> support this new data type where appropriate?
>
> On Wed, Aug 5, 2020 at 12:01 PM Wes McKinney <we...@gmail.com> wrote:
>
> > Sounds fine to me. I guess one question is what needs to be formalized
> > in the Schema.fbs files or elsewhere in the columnar format
> > documentation (and we will need to hold an associated vote for that I
> > think)
> >
> > On Mon, Aug 3, 2020 at 11:30 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> > >
> > > Given no objections, we'll go ahead and start implementing support for
> > 256-bit decimals.
> > >
> > > I'm considering setting up another branch to develop all the components
> > so they can be merged to master atomically.
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > >>
> > >> Generally this sounds fine to me. At some point it would be good to
> > >> add 32-bit and 64-bit decimal support but this can be done in the
> > >> future.
> > >>
> > >> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com>
> wrote:
> > >> >
> > >> > Hi Micah,
> > >> >
> > >> > Thanks for opening the discussion.
> > >> > I am aware of some scenarios where decimal requires more than 16
> > bytes, so
> > >> > I think it would be beneficial to support this in Arrow.
> > >> >
> > >> > Best,
> > >> > Liya Fan
> > >> >
> > >> >
> > >> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <
> > emkornfield@gmail.com>
> > >> > wrote:
> > >> >
> > >> > > Hi Arrow Dev,
> > >> > > ZetaSQL (Google's open source standard SQL library) recently
> > introduced a
> > >> > > BigNumeric [1] type which requires a 256 bit width to properly
> > support it.
> > >> > > I'd like to add support (possibly in collaboration with some of my
> > >> > > colleagues) to add support for 256 bit width Decimals in Arrow to
> > support a
> > >> > > type corresponding to BigNumeric.
> > >> > >
> > >> > > In past discussions on this, I don't think we established a
> minimum
> > bar for
> > >> > > supporting additional bit-widths within Arrow.
> > >> > >
> > >> > > I'd like to propose the following requirements:
> > >> > > 1.  A vote agreeing on adding support for a new bitwidth (we can
> > discuss
> > >> > > any objections here).
> > >> > > 2.  Support in Java and C++ for integration tests verifying the
> > ability to
> > >> > > round-trip the value.
> > >> > > 3.  Support in Java for conversion to/from BigDecimal [2]
> > >> > > 4.  Support in Python converting to/from Decimal [3]
> > >> > >
> > >> > > Is there anything else that people feel like is a requirement for
> > basic
> > >> > > support of an additional bit width for Decimal's?
> > >> > >
> > >> > > Thanks,
> > >> > > Micah
> > >> > >
> > >> > >
> > >> > > [1]
> > >> > >
> > >> > >
> >
> https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
> > >> > > [2]
> > https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
> > >> > > [3] https://docs.python.org/3/library/decimal.html
> > >> > >
> >
>

Re: [DISCUSS] Support of higher bit-width Decimal type

Posted by Jacques Nadeau <ja...@apache.org>.
Do we have a good definition of what is necessary to add a new data type?
Adding a type but not pulling it through most of the code seems less than
ideal since it means one part of Arrow doesn't work with another (providing
a less optimal end-user experience).

For example, would this work include making Gandiva and all the kernels
support this new data type where appropriate?

On Wed, Aug 5, 2020 at 12:01 PM Wes McKinney <we...@gmail.com> wrote:

> Sounds fine to me. I guess one question is what needs to be formalized
> in the Schema.fbs files or elsewhere in the columnar format
> documentation (and we will need to hold an associated vote for that I
> think)
>
> On Mon, Aug 3, 2020 at 11:30 PM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Given no objections, we'll go ahead and start implementing support for
> 256-bit decimals.
> >
> > I'm considering setting up another branch to develop all the components
> so they can be merged to master atomically.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >> Generally this sounds fine to me. At some point it would be good to
> >> add 32-bit and 64-bit decimal support but this can be done in the
> >> future.
> >>
> >> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com> wrote:
> >> >
> >> > Hi Micah,
> >> >
> >> > Thanks for opening the discussion.
> >> > I am aware of some scenarios where decimal requires more than 16
> bytes, so
> >> > I think it would be beneficial to support this in Arrow.
> >> >
> >> > Best,
> >> > Liya Fan
> >> >
> >> >
> >> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <
> emkornfield@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Arrow Dev,
> >> > > ZetaSQL (Google's open source standard SQL library) recently
> introduced a
> >> > > BigNumeric [1] type which requires a 256 bit width to properly
> support it.
> >> > > I'd like to add support (possibly in collaboration with some of my
> >> > > colleagues) to add support for 256 bit width Decimals in Arrow to
> support a
> >> > > type corresponding to BigNumeric.
> >> > >
> >> > > In past discussions on this, I don't think we established a minimum
> bar for
> >> > > supporting additional bit-widths within Arrow.
> >> > >
> >> > > I'd like to propose the following requirements:
> >> > > 1.  A vote agreeing on adding support for a new bitwidth (we can
> discuss
> >> > > any objections here).
> >> > > 2.  Support in Java and C++ for integration tests verifying the
> ability to
> >> > > round-trip the value.
> >> > > 3.  Support in Java for conversion to/from BigDecimal [2]
> >> > > 4.  Support in Python converting to/from Decimal [3]
> >> > >
> >> > > Is there anything else that people feel like is a requirement for
> basic
> >> > > support of an additional bit width for Decimal's?
> >> > >
> >> > > Thanks,
> >> > > Micah
> >> > >
> >> > >
> >> > > [1]
> >> > >
> >> > >
> https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
> >> > > [2]
> https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
> >> > > [3] https://docs.python.org/3/library/decimal.html
> >> > >
>

Re: [DISCUSS] Support of higher bit-width Decimal type

Posted by Micah Kornfield <em...@gmail.com>.
>
> Sounds fine to me. I guess one question is what needs to be formalized
> in the Schema.fbs files or elsewhere in the columnar format
> documentation (and we will need to hold an associated vote for that I
> think)

Yes, i think we will need to hold a vote for it.  Since this is essentially
a "new type", I was planning on trying to develop on the branch for
Schema/Documentation/Code and then have a vote to merge it to master.

On Wed, Aug 5, 2020 at 12:00 PM Wes McKinney <we...@gmail.com> wrote:

> Sounds fine to me. I guess one question is what needs to be formalized
> in the Schema.fbs files or elsewhere in the columnar format
> documentation (and we will need to hold an associated vote for that I
> think)
>
> On Mon, Aug 3, 2020 at 11:30 PM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Given no objections, we'll go ahead and start implementing support for
> 256-bit decimals.
> >
> > I'm considering setting up another branch to develop all the components
> so they can be merged to master atomically.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >> Generally this sounds fine to me. At some point it would be good to
> >> add 32-bit and 64-bit decimal support but this can be done in the
> >> future.
> >>
> >> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com> wrote:
> >> >
> >> > Hi Micah,
> >> >
> >> > Thanks for opening the discussion.
> >> > I am aware of some scenarios where decimal requires more than 16
> bytes, so
> >> > I think it would be beneficial to support this in Arrow.
> >> >
> >> > Best,
> >> > Liya Fan
> >> >
> >> >
> >> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <
> emkornfield@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Arrow Dev,
> >> > > ZetaSQL (Google's open source standard SQL library) recently
> introduced a
> >> > > BigNumeric [1] type which requires a 256 bit width to properly
> support it.
> >> > > I'd like to add support (possibly in collaboration with some of my
> >> > > colleagues) to add support for 256 bit width Decimals in Arrow to
> support a
> >> > > type corresponding to BigNumeric.
> >> > >
> >> > > In past discussions on this, I don't think we established a minimum
> bar for
> >> > > supporting additional bit-widths within Arrow.
> >> > >
> >> > > I'd like to propose the following requirements:
> >> > > 1.  A vote agreeing on adding support for a new bitwidth (we can
> discuss
> >> > > any objections here).
> >> > > 2.  Support in Java and C++ for integration tests verifying the
> ability to
> >> > > round-trip the value.
> >> > > 3.  Support in Java for conversion to/from BigDecimal [2]
> >> > > 4.  Support in Python converting to/from Decimal [3]
> >> > >
> >> > > Is there anything else that people feel like is a requirement for
> basic
> >> > > support of an additional bit width for Decimal's?
> >> > >
> >> > > Thanks,
> >> > > Micah
> >> > >
> >> > >
> >> > > [1]
> >> > >
> >> > >
> https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
> >> > > [2]
> https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
> >> > > [3] https://docs.python.org/3/library/decimal.html
> >> > >
>

Re: [DISCUSS] Support of higher bit-width Decimal type

Posted by Wes McKinney <we...@gmail.com>.
Sounds fine to me. I guess one question is what needs to be formalized
in the Schema.fbs files or elsewhere in the columnar format
documentation (and we will need to hold an associated vote for that I
think)

On Mon, Aug 3, 2020 at 11:30 PM Micah Kornfield <em...@gmail.com> wrote:
>
> Given no objections, we'll go ahead and start implementing support for 256-bit decimals.
>
> I'm considering setting up another branch to develop all the components so they can be merged to master atomically.
>
> Thanks,
> Micah
>
> On Tue, Jul 28, 2020 at 6:39 AM Wes McKinney <we...@gmail.com> wrote:
>>
>> Generally this sounds fine to me. At some point it would be good to
>> add 32-bit and 64-bit decimal support but this can be done in the
>> future.
>>
>> On Tue, Jul 28, 2020 at 7:28 AM Fan Liya <li...@gmail.com> wrote:
>> >
>> > Hi Micah,
>> >
>> > Thanks for opening the discussion.
>> > I am aware of some scenarios where decimal requires more than 16 bytes, so
>> > I think it would be beneficial to support this in Arrow.
>> >
>> > Best,
>> > Liya Fan
>> >
>> >
>> > On Tue, Jul 28, 2020 at 11:12 AM Micah Kornfield <em...@gmail.com>
>> > wrote:
>> >
>> > > Hi Arrow Dev,
>> > > ZetaSQL (Google's open source standard SQL library) recently introduced a
>> > > BigNumeric [1] type which requires a 256 bit width to properly support it.
>> > > I'd like to add support (possibly in collaboration with some of my
>> > > colleagues) to add support for 256 bit width Decimals in Arrow to support a
>> > > type corresponding to BigNumeric.
>> > >
>> > > In past discussions on this, I don't think we established a minimum bar for
>> > > supporting additional bit-widths within Arrow.
>> > >
>> > > I'd like to propose the following requirements:
>> > > 1.  A vote agreeing on adding support for a new bitwidth (we can discuss
>> > > any objections here).
>> > > 2.  Support in Java and C++ for integration tests verifying the ability to
>> > > round-trip the value.
>> > > 3.  Support in Java for conversion to/from BigDecimal [2]
>> > > 4.  Support in Python converting to/from Decimal [3]
>> > >
>> > > Is there anything else that people feel like is a requirement for basic
>> > > support of an additional bit width for Decimal's?
>> > >
>> > > Thanks,
>> > > Micah
>> > >
>> > >
>> > > [1]
>> > >
>> > > https://github.com/google/zetasql/blob/1aefaa7c62fc7a50def879bb7c4225ec6974b7ef/zetasql/public/numeric_value.h#L486
>> > > [2] https://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html
>> > > [3] https://docs.python.org/3/library/decimal.html
>> > >