You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/04 09:03:27 UTC

[GitHub] [arrow] jorisvandenbossche commented on pull request #9948: ARROW-12150: [Python] Correctly infer type of mixed-precision Decimals

jorisvandenbossche commented on pull request #9948:
URL: https://github.com/apache/arrow/pull/9948#issuecomment-831791116


   For now, I pushed a minimal fix to ensure we don't raise for a different precision if we rescale the decimal. That at least fixes the existing error for eg `pa.array([Decimal('123E+2')])`.
   
   But still running into some some more corner cases regarding the precision/scale inference from a Python Decimal.
   
   Using released version of pyarrow, we have an inconsistency in how to deal with positive exponents:
   
   ```python
   >>> pa.infer_type([decimal.Decimal('123e4')])
   Decimal128Type(decimal128(7, 0))
   >>> pa.infer_type([decimal.Decimal('123e2')])
   Decimal128Type(decimal128(3, -2))
   ```
   
   Those two cases should be interpreted consistently, and following https://github.com/apache/arrow/pull/9948#issuecomment-817835991 they should give `(3, -4)` and `(3, -2)`. The reason for this difference is this `num_digits <= abs_exponent` check (so "123e4" with the exponent of 4 being larger than number of digits of 3 is handled differently):
   
   https://github.com/apache/arrow/blob/b3e43987c47b2f01b204a2d954f882f7161616ef/cpp/src/arrow/python/decimal.cc#L79-L84
   
   This check is to ensure to add precision in case of leading zeros as in `decimal.Decimal('0.0123')`:
   
   ```python
   >>> decimal.Decimal('0.0123').as_tuple()
   DecimalTuple(sign=0, digits=(1, 2, 3), exponent=-4)
   >>> pa.infer_type([decimal.Decimal('0.0123')])
   Decimal128Type(decimal128(4, 4))
   ```
   So we could do this `num_additional_zeros` edit only when the exponent is negative, to ensure that `Decimal('123e4')` and `Decimal('123e2')` (both with a positive exponent) are treated consistently. 
   
   However, changing that would change the tested behaviour for:
   
   ```python
   >>> pa.infer_type([decimal.Decimal('0.01E5')])
   Decimal128Type(decimal128(4, 0))
   ```
   
   to change to `decimal128(1, -3)`. Are we fine with changing this tested behaviour for `decimal.Decimal('0.01E5')]` ?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org