You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by "Chigarev, Dmitry" <dm...@intel.com> on 2020/10/08 19:11:04 UTC

[C++] Support of Decimal16, Decimal32 and Decimal64

Hi everyone, I would like to work on this JIRA ticket:
https://issues.apache.org/jira/browse/ARROW-9404
([C++] Add support for Decimal16, Decimal32 and Decimal64)

This will be my first experience with contributing to Arrow, so I want to ask advice what approach should I use.
As far as I know, currently, Arrow supports only Decimal128 and its basic and primary implementations located at `cpp/src/arrow/util/basic_decimal.h` and `.../util/decimal.h`. In current implementation 128-bit Decimal represented by two 64-bit integers. So there are several approaches that can be applied:

1. From current BasicDecimal128 class make a template class BasicDecimal<bit_width>, whose `low` and `high` variable types will be dependent on `bit_width` template parameter, implementation of methods also will be rewritten to depend on `bit_width`. As the result, we will have classes that work with `bit_width / 2`-bit integers.
The disadvantage of this approach is that even when we can fit our decimal in a single int value, we will still be splitting it into two variables and have to do that unnecessary extra logic of handling these two variables. But all Decimals will be the instances of one template class and will be consistent with each other.

2. Implement new template class BasicDecimal<bit_width> and Decimal<bit_width> which will work only for bit_width <= 64 (where we can represent our decimal with a single `int##bit_width##_t` variable), and also reimplement all methods of Decimals in this new class.
But that approach makes ambiguous what Decimal is, because technically Decimal64 and Decimal128 will be completely different classes which can create some inconsistency between them.

3. If we have some variable that indicates a maximum bit integer, then we can try to apply the following approach. Define template BasicDecimal<bit_width> class whose value will be represented not by ints variables, but by array of ints:
```
// Pseudo-code, may be incorrect
template<int width>
class BasicDecimal{
    using int_type = IntBitWidthTypes<max(width, MAX_INT_WIDTH)>::type;
    int_type values[max(width / MAX_INT_WIDTH, 1)];
// all of these can be computed at the compile time
....
};

using BasicDecimal128 = BasicDecimal<128>;
using BasicDecimal64 = BasicDecimal<64>;
....
```
In the result, Decimal128 will have an uint64_t array of 2 elements, Decimal64 will have an uint64_t array of 1 element, Decimal32 - uint32_t array of 1 element, and so on...
This also allows us to define decimals of arbitrary bitness. For example, Decimal256 will be represented as an array of uint64_t with 4 elements.
The bad side of this approach is its complexity - we need to rewrite the whole BasicDecimal and Decimal class.

Which one of these approaches will be the correct one?

P.S. I've just noticed that I'm not being able to assign JIRA tasks for myself, how can I do this?


--------------------------------------------------------------------
Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Re: [C++] Support of Decimal16, Decimal32 and Decimal64

Posted by Micah Kornfield <em...@gmail.com>.

Hi Dmitry,
Thanks for volunteering to contribute.  Note that we are in the process of
implementing Decimal256 support already, which is currently on a separate
branch [1] but I'm hoping to have  PR sometime early next week.  If we are
proposing adding support for these lower bit-widths, I think we will likely
need an implementation in Java (maybe someone else is interested in doing
this on that side).  For reference [2] is the discussion for adding Decimal
256-bit width

Thanks,
Micah

[1] https://github.com/apache/arrow/tree/decimal256
[2]
https://mail-archives.apache.org/mod_mbox/arrow-dev/202007.mbox/%3CCAK7Z5T9VgsTmjD%3Dmdq1Sci5D0%2BVZPVKWzxCWfHErUdNjak1X9w%40mail.gmail.com%3E

On Thu, Oct 8, 2020 at 2:12 PM Wes McKinney <we...@gmail.com> wrote:

> Based in what I've seen in other systems (like Apache Kudu) that
> support at least 32/64-bit decimal, representing them with a single
> integer value is probably the best thing (in terms of computing
> performance, consistency with other implementations)
>
> I added you as a contributor on Jira so you can assign yourself issues now
>
> On Thu, Oct 8, 2020 at 2:33 PM Chigarev, Dmitry
> <dm...@intel.com> wrote:
> >
> > Hi everyone, I would like to work on this JIRA ticket:
> > https://issues.apache.org/jira/browse/ARROW-9404
> > ([C++] Add support for Decimal16, Decimal32 and Decimal64)
> >
> > This will be my first experience with contributing to Arrow, so I want
> to ask advice what approach should I use.
> > As far as I know, currently, Arrow supports only Decimal128 and its
> basic and primary implementations located at
> `cpp/src/arrow/util/basic_decimal.h` and `.../util/decimal.h`. In current
> implementation 128-bit Decimal represented by two 64-bit integers. So there
> are several approaches that can be applied:
> >
> > 1. From current BasicDecimal128 class make a template class
> BasicDecimal<bit_width>, whose `low` and `high` variable types will be
> dependent on `bit_width` template parameter, implementation of methods also
> will be rewritten to depend on `bit_width`. As the result, we will have
> classes that work with `bit_width / 2`-bit integers.
> > The disadvantage of this approach is that even when we can fit our
> decimal in a single int value, we will still be splitting it into two
> variables and have to do that unnecessary extra logic of handling these two
> variables. But all Decimals will be the instances of one template class and
> will be consistent with each other.
> >
> > 2. Implement new template class BasicDecimal<bit_width> and
> Decimal<bit_width> which will work only for bit_width <= 64 (where we can
> represent our decimal with a single `int##bit_width##_t` variable), and
> also reimplement all methods of Decimals in this new class.
> > But that approach makes ambiguous what Decimal is, because technically
> Decimal64 and Decimal128 will be completely different classes which can
> create some inconsistency between them.
> >
> > 3. If we have some variable that indicates a maximum bit integer, then
> we can try to apply the following approach. Define template
> BasicDecimal<bit_width> class whose value will be represented not by ints
> variables, but by array of ints:
> > ```
> > // Pseudo-code, may be incorrect
> > template<int width>
> > class BasicDecimal{
> >     using int_type = IntBitWidthTypes<max(width, MAX_INT_WIDTH)>::type;
> >     int_type values[max(width / MAX_INT_WIDTH, 1)];
> > // all of these can be computed at the compile time
> > ....
> > };
> >
> > using BasicDecimal128 = BasicDecimal<128>;
> > using BasicDecimal64 = BasicDecimal<64>;
> > ....
> > ```
> > In the result, Decimal128 will have an uint64_t array of 2 elements,
> Decimal64 will have an uint64_t array of 1 element, Decimal32 - uint32_t
> array of 1 element, and so on...
> > This also allows us to define decimals of arbitrary bitness. For
> example, Decimal256 will be represented as an array of uint64_t with 4
> elements.
> > The bad side of this approach is its complexity - we need to rewrite the
> whole BasicDecimal and Decimal class.
> >
> > Which one of these approaches will be the correct one?
> >
> > P.S. I've just noticed that I'm not being able to assign JIRA tasks for
> myself, how can I do this?
> >
> >
> > --------------------------------------------------------------------
> > Joint Stock Company Intel A/O
> > Registered legal address: Krylatsky Hills Business Park,
> > 17 Krylatskaya Str., Bldg 4, Moscow 121614,
> > Russian Federation
> >
> > This e-mail and any attachments may contain confidential material for
> > the sole use of the intended recipient(s). Any review or distribution
> > by others is strictly prohibited. If you are not the intended
> > recipient, please contact the sender and delete all copies.
>

Re: [C++] Support of Decimal16, Decimal32 and Decimal64

Posted by Wes McKinney <we...@gmail.com>.

Based in what I've seen in other systems (like Apache Kudu) that
support at least 32/64-bit decimal, representing them with a single
integer value is probably the best thing (in terms of computing
performance, consistency with other implementations)

I added you as a contributor on Jira so you can assign yourself issues now

On Thu, Oct 8, 2020 at 2:33 PM Chigarev, Dmitry
<dm...@intel.com> wrote:
>
> Hi everyone, I would like to work on this JIRA ticket:
> https://issues.apache.org/jira/browse/ARROW-9404
> ([C++] Add support for Decimal16, Decimal32 and Decimal64)
>
> This will be my first experience with contributing to Arrow, so I want to ask advice what approach should I use.
> As far as I know, currently, Arrow supports only Decimal128 and its basic and primary implementations located at `cpp/src/arrow/util/basic_decimal.h` and `.../util/decimal.h`. In current implementation 128-bit Decimal represented by two 64-bit integers. So there are several approaches that can be applied:
>
> 1. From current BasicDecimal128 class make a template class BasicDecimal<bit_width>, whose `low` and `high` variable types will be dependent on `bit_width` template parameter, implementation of methods also will be rewritten to depend on `bit_width`. As the result, we will have classes that work with `bit_width / 2`-bit integers.
> The disadvantage of this approach is that even when we can fit our decimal in a single int value, we will still be splitting it into two variables and have to do that unnecessary extra logic of handling these two variables. But all Decimals will be the instances of one template class and will be consistent with each other.
>
> 2. Implement new template class BasicDecimal<bit_width> and Decimal<bit_width> which will work only for bit_width <= 64 (where we can represent our decimal with a single `int##bit_width##_t` variable), and also reimplement all methods of Decimals in this new class.
> But that approach makes ambiguous what Decimal is, because technically Decimal64 and Decimal128 will be completely different classes which can create some inconsistency between them.
>
> 3. If we have some variable that indicates a maximum bit integer, then we can try to apply the following approach. Define template BasicDecimal<bit_width> class whose value will be represented not by ints variables, but by array of ints:
> ```
> // Pseudo-code, may be incorrect
> template<int width>
> class BasicDecimal{
>     using int_type = IntBitWidthTypes<max(width, MAX_INT_WIDTH)>::type;
>     int_type values[max(width / MAX_INT_WIDTH, 1)];
> // all of these can be computed at the compile time
> ....
> };
>
> using BasicDecimal128 = BasicDecimal<128>;
> using BasicDecimal64 = BasicDecimal<64>;
> ....
> ```
> In the result, Decimal128 will have an uint64_t array of 2 elements, Decimal64 will have an uint64_t array of 1 element, Decimal32 - uint32_t array of 1 element, and so on...
> This also allows us to define decimals of arbitrary bitness. For example, Decimal256 will be represented as an array of uint64_t with 4 elements.
> The bad side of this approach is its complexity - we need to rewrite the whole BasicDecimal and Decimal class.
>
> Which one of these approaches will be the correct one?
>
> P.S. I've just noticed that I'm not being able to assign JIRA tasks for myself, how can I do this?
>
>
> --------------------------------------------------------------------
> Joint Stock Company Intel A/O
> Registered legal address: Krylatsky Hills Business Park,
> 17 Krylatskaya Str., Bldg 4, Moscow 121614,
> Russian Federation
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.