You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/02 15:52:17 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #1779: DataType::Decimal Non-Compliant

tustvold opened a new issue, #1779:
URL: https://github.com/apache/arrow-rs/issues/1779

   **Describe the bug**
   
   DataType::Decimal is defined as
   
   ```
   /// Exact decimal value with precision and scale
   ///
   /// * precision is the total number of digits
   /// * scale is the number of digits past the decimal
   ///
   /// For example the number 123.45 has precision 5 and scale 2.
   Decimal(usize, usize),
   ```
   
   This appears to be at odds with both the [C++](https://arrow.apache.org/docs/cpp/api/datatype.html?highlight=decimal#classarrow_1_1_decimal128_type) and [python](https://arrow.apache.org/docs/python/generated/pyarrow.decimal128.html) implementations (I can't actually find the specification).
   
   These define it as
   
   > Arrow decimals are fixed-point decimal numbers encoded as a scaled integer. The precision is the number of significant digits that the decimal type can represent; the scale is the number of digits after the decimal point (note the scale can be negative).
   
   In particular with the current rust definition it is unclear how to represent numbers with more than 38 digits, either because of leading or trailing 0s.
   
   **To Reproduce**
   
   Inspect code
   
   **Expected behavior**
   
   We should be conforming to the other arrow implementations
   
   **Additional context**
   
   Noticed whilst reviewing https://github.com/apache/arrow-datafusion/pull/2680
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #1779: DataType::Decimal Non-Compliant
URL: https://github.com/apache/arrow-rs/issues/1779


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] jorgecarleitao commented on issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #1779:
URL: https://github.com/apache/arrow-rs/issues/1779#issuecomment-1146037480

   Sorry, I am probably misreading this - I am not following what is the non-compliance.
   
   The Python documentation exemplifies:
   
   > As an example, decimal128(7, 3) can exactly represent the numbers 1234.567 and -1234.567 (encoded internally as the 128-bit integers 1234567 and -1234567, respectively)
   
   (so, 7 digits, 3 of them are decimal)
   
   we offer the example
   
   > For example the number 123.45 has precision 5 and scale 2.
   
   The spec is https://github.com/apache/arrow/blob/master/format/Schema.fbs#L185 (copied for readability below)
   
   ```
   /// Exact decimal value represented as an integer value in two's
   /// complement. Currently only 128-bit (16-byte) and 256-bit (32-byte) integers
   /// are used. The representation uses the endianness indicated
   /// in the Schema.
   table Decimal {
     /// Total number of decimal digits
     precision: int;
   
     /// Number of digits after the decimal point "."
     scale: int;
   
     /// Number of bits per value. The only accepted widths are 128 and 256.
     /// We use bitWidth for consistency with Int::bitWidth.
     bitWidth: int = 128;
   }
   ```
   
   Aren't they all aligned?
   
   > In particular with the current rust definition it is unclear how to represent numbers with more than 38 digits, either because of leading or trailing 0s.
   
   The Python docs continue
   
   > [Decimal128Type](https://arrow.apache.org/docs/cpp/api/datatype.html?highlight=decimal#classarrow_1_1_decimal128_type) has a maximum precision of 38 significant digits
   
   which also seems aligned?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1779:
URL: https://github.com/apache/arrow-rs/issues/1779#issuecomment-1146591354

   > The scale field within DataType::Decimal should be signed
   
   I agree
   
   > The documentation should refer to scale as an exponent, and not potentially misleading notions of decimal points and digits
   
   I am not sure about this -- I think the documentation could be clarified but talking about scale/exponent/mantissa is not appropriate for fixed width representation (they are floating point concepts)
   
   > DecimalArray::value shouldn't be applying the scale to the value at all
   
   Definitely worth discussing -- I suspect some people would like `DecimalArray` to convert to `f64` and some would like to do the conversion themselves. I can see a need for access to the raw `i128` as well as a conversion
   
   > We should probably audit the other places that interpret decimal values for consistency
   
   👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1779:
URL: https://github.com/apache/arrow-rs/issues/1779#issuecomment-1146593465

   Apologies this appears to have been my confusion, we should create a separate ticket to track making scale negative and clarifying the docs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1779:
URL: https://github.com/apache/arrow-rs/issues/1779#issuecomment-1146020869

   I don't remember any details -- the discrepancy is likely my ignorance


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1779:
URL: https://github.com/apache/arrow-rs/issues/1779#issuecomment-1146072835

   > The spec is
   
   Thank you for the link, that is very unfortunately worded but explains where this confusion has originated from.  My interpretation of those comments would be that the represented values would have `precision`  digits, with the decimal point located somewhere in that range. This also was the interpretation initially taken on https://github.com/apache/arrow-datafusion/pull/2680, and appears also to at least initially have been the interpretation of the decimal implementation in arrow-rs.
   
   If one takes the python and C++ implementations as correct, what the docs appear are actually saying is that the mantissa has `precision` digits, and `scale` is the exponent, but then talks about digits past a decimal point, which is somewhat inconsistent.... 
   
   To highlight the distinction, consider the number `1000` or `0.0001`. Do these have 4 digits or 1 digit? If they have 1 digit, how many digits does they have past the decimal point? 
   
   Ultimately I think there are at least these concrete issues that should be fixed:
   
   * The scale field within DataType::Decimal should be signed
   * The documentation should refer to scale as an exponent, and not potentially misleading notions of decimal points and digits
   * `DecimalArray::value` shouldn't be applying the scale to the value at all
   * We should probably audit the other places that interpret decimal values for consistency
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1779:
URL: https://github.com/apache/arrow-rs/issues/1779#issuecomment-1145045156

   To add to the confusion `value` and `value_as_string` seem to interpret the data differently, with value interpreting it inline with the arrow "specification" and value_as_string doing something different.
   
   ```
   #[test]
   fn test_decimal_array() {
       // let val_8887: [u8; 16] = [192, 219, 180, 17, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0];
       // let val_neg_8887: [u8; 16] = [64, 36, 75, 238, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255];
       let values: [u8; 32] = [
           192, 219, 180, 17, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 64, 36, 75, 238, 253,
           255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
       ];
       let array_data = ArrayData::builder(DataType::Decimal(23, 6))
           .len(2)
           .add_buffer(Buffer::from(&values[..]))
           .build()
           .unwrap();
       let decimal_array = DecimalArray::from(array_data);
       assert_eq!(8_887_000_000, decimal_array.value(0));
       assert_eq!("8887.000000", decimal_array.value_as_string(0));
       assert_eq!(-8_887_000_000, decimal_array.value(1));
       assert_eq!(16, decimal_array.value_length());
   }
   ```
   
   Decimal support appears to have been added back in https://github.com/apache/arrow/pull/8640, perhaps @nevi-me, @alamb  or @jorgecarleitao have some recollection of what is going on here


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #1779: DataType::Decimal Non-Compliant

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1779:
URL: https://github.com/apache/arrow-rs/issues/1779#issuecomment-1146590989

   I think "Fixed Width Decimal" is a fairly common concept and is used for some specialized usecases (like currency), and I think the arrow spec basically follow the standard practice. https://en.wikipedia.org/wiki/Fixed-point_arithmetic#Representation 
   
   Like  @jorgecarleitao , I don't see the problem. 
   
   > 1000 or 0.0001. Do these have 4 digits or 1 digit? If they have 1 digit, how many digits do they have past the decimal point?
   
   I think the better question is "how would these be represented in different Decimal types"
   
   For `Decimal(5, 1)` they would be represented by `i128`s with the values `10000` and `00001` respectively
   
   You could also represent `1000` as `Decimal(5, 0)` with `i128` value 10000 or `Decimal(4,0)` with `i128` value `1000`.
   
   > In particular with the current rust definition it is unclear how to represent numbers with more than 38 digits 
   
   I do not think that is possible. The reason being that `i128::MAX` is `170141183460469231731687303715884105727`, which can represent only 38 decimal digits


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org