You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2022/09/07 16:54:00 UTC

[jira] [Comment Edited] (ARROW-17601) [C++] Error when creating Expression on Decimal128 types: precision out of range

    [ https://issues.apache.org/jira/browse/ARROW-17601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17601404#comment-17601404 ] 

Weston Pace edited comment on ARROW-17601 at 9/7/22 4:53 PM:
-------------------------------------------------------------

I think query engines and planners really like to know the output type before the plan is not executed.  Determining the output type based on the input is not something that is traditionally done (though maybe it is time to break with tradition :).

That being said, we could perhaps model ourselves after Substrait (which itself, modeled its rules after Calcite which is maybe based on Spark's rules which is probably modeled after something else...)

In Substrait, if a limit is hit when determining the output, then scale is dropped (instead of outright rejection).  It is first dropped from the decimal places.  Once the decimal places have been reduced to 6 digits it begins to drop precision from the non-decimal digits.

So, by Substrait rules, MULTIPLY(DECIMAL<33, 4>, DECIMAL<15, 2>) yields DECIMAL<49, 6> which truncates to DECIMAL<38, 6>

Here is how it is explained in TSQL https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16

{quote}
In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to store integral part of the decimal number. If there isn't enough space to store them that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale, the scale is reduced to provide enough space for integral part. Resulting scale is MIN(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be rounded to fit into the resulting scale.

In multiplication and division operations, we need precision - scale places to store the integral part of the result. The scale might be reduced using the following rules:

    The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the integral part is less than 32, because it can't be greater than 38 - (precision-scale). Result might be rounded in this case.
    The scale won't be changed if it's less than 6 and if the integral part is greater than 32. In this case, overflow error might be raised if it can't fit into decimal(38, scale)
    The scale will be set to 6 if it's greater than 6 and if the integral part is greater than 32. In this case, both integral part and scale would be reduced and resulting type is decimal(38,6). Result might be rounded to 6 decimal places or the overflow error will be thrown if the integral part can't fit into 32 digits.
{quote}


was (Author: westonpace):
I think query engines and planners really like to know the output type before the plan is not executed.  Determining the output type based on the input is not something that is traditionally done (though maybe it is time to break with tradition :).

That being said, we could perhaps model ourselves after Substrait (which itself, modeled its rules after Calcite which is maybe based on Spark's rules which is probably modeled after something else...)

In Substrait, if a limit is hit when determining the output, then precision is dropped (instead of outright rejection).  It is first dropped from the decimal places.  Once the decimal places have been reduced to 6 digits it begins to drop precision from the non-decimal digits.

So, by Substrait rules, MULTIPLY(DECIMAL<33, 4>, DECIMAL<15, 2>) yields DECIMAL<49, 6> which truncates to DECIMAL<38, 6>

Here is how it is explained in TSQL https://docs.microsoft.com/en-us/sql/t-sql/data-types/precision-scale-and-length-transact-sql?view=sql-server-ver16

{quote}
In addition and subtraction operations, we need max(p1 - s1, p2 - s2) places to store integral part of the decimal number. If there isn't enough space to store them that is, max(p1 - s1, p2 - s2) < min(38, precision) - scale, the scale is reduced to provide enough space for integral part. Resulting scale is MIN(precision, 38) - max(p1 - s1, p2 - s2), so the fractional part might be rounded to fit into the resulting scale.

In multiplication and division operations, we need precision - scale places to store the integral part of the result. The scale might be reduced using the following rules:

    The resulting scale is reduced to min(scale, 38 - (precision-scale)) if the integral part is less than 32, because it can't be greater than 38 - (precision-scale). Result might be rounded in this case.
    The scale won't be changed if it's less than 6 and if the integral part is greater than 32. In this case, overflow error might be raised if it can't fit into decimal(38, scale)
    The scale will be set to 6 if it's greater than 6 and if the integral part is greater than 32. In this case, both integral part and scale would be reduced and resulting type is decimal(38,6). Result might be rounded to 6 decimal places or the overflow error will be thrown if the integral part can't fit into 32 digits.
{quote}

> [C++] Error when creating Expression on Decimal128 types: precision out of range
> --------------------------------------------------------------------------------
>
>                 Key: ARROW-17601
>                 URL: https://issues.apache.org/jira/browse/ARROW-17601
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++
>            Reporter: Neal Richardson
>            Assignee: Yibo Cai
>            Priority: Major
>
> Reproducer in R:
> {code}
> library(arrow)
> library(dplyr)
> tab <- Table$create(col1 = 1:4, col2 = 5:8)
> tab <- tab$cast(schema(col1 = decimal128(33, 4), col2 = decimal128(15, 2)))
> tab %>% mutate(col1 * col2)
> # Error: Invalid: Decimal precision out of range [1, 38]: 49
> # /Users/me/arrow/cpp/src/arrow/compute/kernels/scalar_arithmetic.cc:1078  DecimalType::Make(left_type.id(), precision, scale)
> # /Users/me/arrow/cpp/src/arrow/compute/exec/expression.cc:413  call.kernel->signature->out_type().Resolve(&kernel_context, types)
> {code}
> We don't have this problem integers and floats (see comment below). For consistency with the other arithmetic functions, what I would expect would be that we would expand the precision as much as we could within Decimal128–in this case, Decimal128(38, 6)–and the compute function would either error _if_ there is an overflow (in the _checked version) or just overflow in the non-checked version. But it wouldn't error on determining the output type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)