You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "wirable23 (via GitHub)" <gi...@apache.org> on 2023/04/16 18:52:04 UTC

[GitHub] [arrow] wirable23 opened a new issue, #35166: pa.compute.sum result for decimal doesn't fit into precision/scale

wirable23 opened a new issue, #35166:
URL: https://github.com/apache/arrow/issues/35166

   ### Describe the bug, including details regarding any error messages, version, and platform.
   
   import pyarrow as pa
   import decimal
   
   ```
   >>> arr = pa.array([decimal.Decimal("9.999"), decimal.Decimal("1.234"), decimal.Decimal("1.234"), decimal.Decimal("1.234")])
   >>> sum_scalar = pa.compute.sum(arr)
   >>> sum_scalar
   <pyarrow.Decimal128Scalar: Decimal('13.701')>
   >>> sum_scalar.type
   Decimal128Type(decimal128(4, 3))
   >>>
   ```
   
   The decimal "13.701" cannot fit into a decimal with scale/precision 4/3. An error is raised when trying to create the scalar directly:
   
   ```
   >>> pa.scalar(decimal.Decimal("13.701"), type=pa.decimal128(4,3))
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "pyarrow\scalar.pxi", line 1100, in pyarrow.lib.scalar
     File "pyarrow\error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow\error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Decimal type with precision 5 does not fit into precision inferred from first array element: 4
   >>>
   ```
   
   So it seems the type of the returned scalar should be Decimal128Type(decimal128(5, 3)), not Decimal128Type(decimal128(4, 3)). It seems sum tries to preserve original scale/precision of array, when likely the sum will not fit in those bounds.
   
   
   
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #35166: pa.compute.sum result for decimal128 doesn't fit into precision/scale

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on issue #35166:
URL: https://github.com/apache/arrow/issues/35166#issuecomment-1514990894

   Yikes.  That's definitely a bug.  I think the return type of sum should probably maximize the `P` parameter.
   
   E.g. `SUM(Decimal128<X,Y>) -> Decimal<38,Y>` and `SUM(Decimal256<X,Y>) -> Decimal<76,Y>`.  This matches the rules for Substrait as well as matches what SQL server does.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] wirable23 commented on issue #35166: pa.compute.sum result for decimal128 doesn't fit into precision/scale

Posted by "wirable23 (via GitHub)" <gi...@apache.org>.

wirable23 commented on issue #35166:
URL: https://github.com/apache/arrow/issues/35166#issuecomment-1516831900

   Where would I go to find these rules? In the internal compute implementation? Should the decimal scale/precision semantics be publically documented as it seems fairly fundamental to the behavior of the API?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] rohanjain101 commented on issue #35166: pa.compute.sum result for decimal128 doesn't fit into precision/scale

Posted by "rohanjain101 (via GitHub)" <gi...@apache.org>.

rohanjain101 commented on issue #35166:
URL: https://github.com/apache/arrow/issues/35166#issuecomment-1515204873

   @westonpace thanks for taking a look. Is there documentation for scale/precision rules for decimal128 in different API's? For example, I found addition follows these rules: https://docs.aws.amazon.com/redshift/latest/dg/r_numeric_computations201.html are used, but couldn't find scale/precision rules for other API's like sum, concat_table, publically documented. Does this information exist somewhere, or does it need to be found look at internal compute implementation?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #35166: pa.compute.sum result for decimal128 doesn't fit into precision/scale

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on issue #35166:
URL: https://github.com/apache/arrow/issues/35166#issuecomment-1523759195

   > Where would I go to find these rules? In the internal compute implementation?
   
   Yes, unfortunately.
   
   > Should the decimal scale/precision semantics be publically documented as it seems fairly fundamental to the behavior of the API?
   
   Yes, I'd definitely welcome more documentation on how this happens.  There isn't a great spot for it today beyond the function doc string itself (e.g. what you get back from `help(pc.sum)`) and this page: https://arrow.apache.org/docs/cpp/compute.html#compute-function-list.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #35166: pa.compute.sum result for decimal128 doesn't fit into precision/scale

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on issue #35166:
URL: https://github.com/apache/arrow/issues/35166#issuecomment-1516807351

   > but couldn't find scale/precision rules for other API's like sum, concat_table, publically documented. Does this information exist somewhere, or does it need to be found look at internal compute implementation?
   
   I'm not sure there is any particular place that we document these rules.  If it isn't found in the function description then it is likely missing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org