You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "viirya (via GitHub)" <gi...@apache.org> on 2023/02/10 17:19:07 UTC

[GitHub] [arrow-rs] viirya opened a new pull request, #3690: Allow precision loss on multiplying decimal arrays

viirya opened a new pull request, #3690:
URL: https://github.com/apache/arrow-rs/pull/3690

# Which issue does this PR close?

Closes #3689.

# Rationale for this change

# What changes are included in this PR?

# Are there any user-facing changes?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1457036830

   > So if I am following correctly, take the example of an array containing `[1, 10, 100]` multiplied by itself, with scale 2 and precision 3
   > 
   > In spark this will result in
   > 
   > ```
   > 1: {value: 1, scale: 2, precision: 3},
   > 2: {value: 100, scale: 2, precision: 3},
   > 3: {value: 100, scale: 0, precision: 3},
   > ```
   
   I guess the some values are incorrect. `{value: 100, scale: 0, precision: 3}` is 100.0, but I think the correct one should be 1.0? I guess you mean `{value: 1, scale: 0, precision: 3}`? I guess the result (if max precision is 3) is:
   
   ```
   1: {value: 0, scale: 0, precision: 3},
   2: {value: 1, scale: 2, precision: 3},
   3: {value: 100, scale: 2, precision: 3},
   ```
   
   Spark will insert a followup expression to make a final precision/scale for decimal. So the final result will be adjusted to same precision/scale again.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1456593148

   I have to confess to not really being sure how to move forward with this, as I have a number of concerns:
   
   * The performance will be catastrophic, it is formatting to strings, and performing multiple memory allocations per operation
   * The data type of the output depends on the input data, something that most query engines, including DataFusion, aren't setup to handle
   * The precision loss bleeds across rows, and therefore is deterministic on the batching, which typically isn't guaranteed in systems like DataFusion
   
   My understanding is this is trying to emulate the behaviour of Spark decimals, which store precision and scale per field. However, I'm not sure this PR actually achieves this, in the event of overflow this PR will truncate the precision of all values in the array, potentially truncating values that wouldn't have been truncated by Spark. I'm therefore not really sure what this PR gains us, it trades a behaviour change that results in an error, to a behaviour change that results in silent truncation? Am I missing something here?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103859143


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {

Review Comment:
   The result of multiplication between smaller type (i.e. `i128`) could be overflowed but it could still be represented as i128 after we truncate it below. If we don't perform multiplication as the larger type, I'm not sure if we can get the (overflowed) result and truncate on it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1136889146


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1168,78 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// If the required scale is greater than the product scale, an error is returned.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(

Review Comment:
   ```suggestion
   pub fn multiply_fixed_point_checked(
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1168,78 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// If the required scale is greater than the product scale, an error is returned.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<PrimitiveArray<Decimal128Type>, ArrowError> {
+    let product_scale = left.scale() + right.scale();
+
+    if required_scale == product_scale {
+        return multiply_checked(left, right);
+    }
+
+    if required_scale > product_scale {
+        return Err(ArrowError::ComputeError(format!(
+            "Required scale {} is greater than product scale {}",
+            required_scale, product_scale
+        )));
+    }
+
+    let precision = min(
+        left.precision() + right.precision() + 1,
+        DECIMAL128_MAX_PRECISION,
+    );
+    let divisor =
+        i256::from_i128(10).pow_wrapping((product_scale - required_scale) as u32);
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        let mut mul = a.wrapping_mul(b);
+        if required_scale < product_scale {
+            mul = divide_and_round::<Decimal256Type>(mul, divisor);
+        }

Review Comment:
   ```suggestion
           mul = divide_and_round::<Decimal256Type>(mul, divisor);
   ```
   I believe this check is redundant as it is checked above



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1168,78 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// If the required scale is greater than the product scale, an error is returned.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<PrimitiveArray<Decimal128Type>, ArrowError> {
+    let product_scale = left.scale() + right.scale();
+
+    if required_scale == product_scale {
+        return multiply_checked(left, right);

Review Comment:
   ```suggestion
           return multiply_checked(left, right)?.with_precision_and_scale(precision, required_scale);
   ```
   
   I think because of https://github.com/apache/arrow-rs/issues/3307
   
   And perhaps a test to confirm



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])

Review Comment:
   ```suggestion
           // [10]
           let b = Decimal128Array::from(vec![10000000000000000000])
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let err = multiply_dyn_checked(&a, &b).unwrap_err();
+        assert!(err.to_string().contains(
+            "Overflow happened on: 123456789000000000000000000 * 10000000000000000000"
+        ));
+
+        // Allow precision loss.
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected =

Review Comment:
   ```suggestion
           // [1234567890]
           let expected =
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let err = multiply_dyn_checked(&a, &b).unwrap_err();
+        assert!(err.to_string().contains(
+            "Overflow happened on: 123456789000000000000000000 * 10000000000000000000"
+        ));
+
+        // Allow precision loss.
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected =
+            Decimal128Array::from(vec![12345678900000000000000000000000000000])
+                .with_precision_and_scale(38, 28)
+                .unwrap();
+
+        assert_eq!(&expected, &result);
+        assert_eq!(
+            result.value_as_string(0),
+            "1234567890.0000000000000000000000000000"
+        );
+
+        // Rounding case
+        let a = Decimal128Array::from(vec![

Review Comment:
   ```suggestion
           [0.000000000000000001, 123456789.555555555555555555, 1.555555555555555555]
           let a = Decimal128Array::from(vec![
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let err = multiply_dyn_checked(&a, &b).unwrap_err();
+        assert!(err.to_string().contains(
+            "Overflow happened on: 123456789000000000000000000 * 10000000000000000000"
+        ));
+
+        // Allow precision loss.
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected =
+            Decimal128Array::from(vec![12345678900000000000000000000000000000])
+                .with_precision_and_scale(38, 28)
+                .unwrap();
+
+        assert_eq!(&expected, &result);
+        assert_eq!(
+            result.value_as_string(0),
+            "1234567890.0000000000000000000000000000"
+        );
+
+        // Rounding case
+        let a = Decimal128Array::from(vec![
+            1,
+            123456789555555555555555555,
+            1555555555555555555,
+        ])
+        .with_precision_and_scale(38, 18)
+        .unwrap();
+
+        let b = Decimal128Array::from(vec![1555555555555555555, 11222222222222222222, 1])

Review Comment:
   ```suggestion
           // [1.555555555555555555, 11.222222222222222222, 0.000000000000000001]
           let b = Decimal128Array::from(vec![1555555555555555555, 11222222222222222222, 1])
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let err = multiply_dyn_checked(&a, &b).unwrap_err();
+        assert!(err.to_string().contains(
+            "Overflow happened on: 123456789000000000000000000 * 10000000000000000000"
+        ));
+
+        // Allow precision loss.
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected =
+            Decimal128Array::from(vec![12345678900000000000000000000000000000])
+                .with_precision_and_scale(38, 28)
+                .unwrap();
+
+        assert_eq!(&expected, &result);
+        assert_eq!(
+            result.value_as_string(0),
+            "1234567890.0000000000000000000000000000"
+        );
+
+        // Rounding case
+        let a = Decimal128Array::from(vec![
+            1,
+            123456789555555555555555555,
+            1555555555555555555,
+        ])
+        .with_precision_and_scale(38, 18)
+        .unwrap();
+
+        let b = Decimal128Array::from(vec![1555555555555555555, 11222222222222222222, 1])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected = Decimal128Array::from(vec![
+            15555555556,
+            13854595272345679012071330528765432099,
+            15555555556,
+        ])
+        .with_precision_and_scale(38, 28)
+        .unwrap();
+
+        assert_eq!(&expected, &result);
+
+        // Rounded the value "1385459527.234567901207133052876543209876543210".
+        assert_eq!(
+            result.value_as_string(1),
+            "1385459527.2345679012071330528765432099"
+        );
+        assert_eq!(result.value_as_string(0), "0.0000000000000000015555555556");
+        assert_eq!(result.value_as_string(2), "0.0000000000000000015555555556");
+
+        let a = Decimal128Array::from(vec![1230])
+            .with_precision_and_scale(4, 2)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![1000])
+            .with_precision_and_scale(4, 2)
+            .unwrap();
+
+        // Required scale is same as the product of the input scales. Behavior is same as multiply.
+        let result = mul_fixed_point_checked(&a, &b, 4)
+            .unwrap()
+            .with_precision_and_scale(9, 4)
+            .unwrap();
+        let expected = multiply_checked(&a, &b)
+            .unwrap()
+            .with_precision_and_scale(9, 4)
+            .unwrap();
+        assert_eq!(&expected, &result);

Review Comment:
   ```suggestion
           assert_eq!(&expected, &result);
           assert_eq(result.precision(), 9);
           assert_eq(result.scale(), 9);
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])

Review Comment:
   ```suggestion
           // [123456789]
           let a = Decimal128Array::from(vec![123456789000000000000000000])
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let err = multiply_dyn_checked(&a, &b).unwrap_err();
+        assert!(err.to_string().contains(
+            "Overflow happened on: 123456789000000000000000000 * 10000000000000000000"
+        ));
+
+        // Allow precision loss.
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected =
+            Decimal128Array::from(vec![12345678900000000000000000000000000000])
+                .with_precision_and_scale(38, 28)
+                .unwrap();
+
+        assert_eq!(&expected, &result);
+        assert_eq!(
+            result.value_as_string(0),
+            "1234567890.0000000000000000000000000000"
+        );
+
+        // Rounding case
+        let a = Decimal128Array::from(vec![
+            1,
+            123456789555555555555555555,
+            1555555555555555555,
+        ])
+        .with_precision_and_scale(38, 18)
+        .unwrap();
+
+        let b = Decimal128Array::from(vec![1555555555555555555, 11222222222222222222, 1])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected = Decimal128Array::from(vec![
+            15555555556,
+            13854595272345679012071330528765432099,
+            15555555556,
+        ])
+        .with_precision_and_scale(38, 28)
+        .unwrap();
+
+        assert_eq!(&expected, &result);
+
+        // Rounded the value "1385459527.234567901207133052876543209876543210".

Review Comment:
   :+1:



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let err = multiply_dyn_checked(&a, &b).unwrap_err();
+        assert!(err.to_string().contains(
+            "Overflow happened on: 123456789000000000000000000 * 10000000000000000000"
+        ));
+
+        // Allow precision loss.
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected =
+            Decimal128Array::from(vec![12345678900000000000000000000000000000])
+                .with_precision_and_scale(38, 28)
+                .unwrap();
+
+        assert_eq!(&expected, &result);
+        assert_eq!(
+            result.value_as_string(0),
+            "1234567890.0000000000000000000000000000"
+        );
+
+        // Rounding case
+        let a = Decimal128Array::from(vec![
+            1,
+            123456789555555555555555555,
+            1555555555555555555,
+        ])
+        .with_precision_and_scale(38, 18)
+        .unwrap();
+
+        let b = Decimal128Array::from(vec![1555555555555555555, 11222222222222222222, 1])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected = Decimal128Array::from(vec![

Review Comment:
   ```suggestion
           // [
           //    0.0000000000000000015555555556, 
           //    1385459527.2345679012071330528765432099, 
           //    0.0000000000000000015555555556
           // ]
           let expected = Decimal128Array::from(vec![
   ```



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -3231,4 +3306,91 @@ mod tests {
 
         assert_eq!(&expected, &result);
     }
+
+    #[test]
+    fn test_decimal_multiply_allow_precision_loss() {
+        // Overflow happening as i128 cannot hold multiplying result.
+        let a = Decimal128Array::from(vec![123456789000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![10000000000000000000])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let err = multiply_dyn_checked(&a, &b).unwrap_err();
+        assert!(err.to_string().contains(
+            "Overflow happened on: 123456789000000000000000000 * 10000000000000000000"
+        ));
+
+        // Allow precision loss.
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected =
+            Decimal128Array::from(vec![12345678900000000000000000000000000000])
+                .with_precision_and_scale(38, 28)
+                .unwrap();
+
+        assert_eq!(&expected, &result);
+        assert_eq!(
+            result.value_as_string(0),
+            "1234567890.0000000000000000000000000000"
+        );
+
+        // Rounding case
+        let a = Decimal128Array::from(vec![
+            1,
+            123456789555555555555555555,
+            1555555555555555555,
+        ])
+        .with_precision_and_scale(38, 18)
+        .unwrap();
+
+        let b = Decimal128Array::from(vec![1555555555555555555, 11222222222222222222, 1])
+            .with_precision_and_scale(38, 18)
+            .unwrap();
+
+        let result = mul_fixed_point_checked(&a, &b, 28).unwrap();
+        let expected = Decimal128Array::from(vec![
+            15555555556,
+            13854595272345679012071330528765432099,
+            15555555556,
+        ])
+        .with_precision_and_scale(38, 28)
+        .unwrap();
+
+        assert_eq!(&expected, &result);
+
+        // Rounded the value "1385459527.234567901207133052876543209876543210".
+        assert_eq!(
+            result.value_as_string(1),
+            "1385459527.2345679012071330528765432099"
+        );
+        assert_eq!(result.value_as_string(0), "0.0000000000000000015555555556");
+        assert_eq!(result.value_as_string(2), "0.0000000000000000015555555556");
+
+        let a = Decimal128Array::from(vec![1230])
+            .with_precision_and_scale(4, 2)
+            .unwrap();
+
+        let b = Decimal128Array::from(vec![1000])
+            .with_precision_and_scale(4, 2)
+            .unwrap();
+
+        // Required scale is same as the product of the input scales. Behavior is same as multiply.
+        let result = mul_fixed_point_checked(&a, &b, 4)
+            .unwrap()
+            .with_precision_and_scale(9, 4)
+            .unwrap();
+        let expected = multiply_checked(&a, &b)
+            .unwrap()
+            .with_precision_and_scale(9, 4)
+            .unwrap();
+        assert_eq!(&expected, &result);

Review Comment:
   ```suggestion
           assert_eq!(&expected, &result);
           assert_eq(result.precision(), 9);
           assert_eq(result.scale(), 4);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] alamb commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1452049023

   FWIW @jackwener  has identified that this PR will fix https://github.com/apache/arrow-datafusion/issues/5396d downstream in DataFusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103859302


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                // Round the value if an exact representation is not possible.
+                // ref: java.math.BigDecimal#doRound
+                let mut digits = a.to_string().len() as i8;
+                let mut diff = digits - (Decimal128Type::MAX_PRECISION as i8);
+
+                while diff > 0 {
+                    let divisor = i256::from_i128(10).pow_wrapping(diff as u32);
+                    a = divide_and_round::<Decimal256Type>(a, divisor);
+                    product_scale -= diff;

Review Comment:
   If say the 5th value of the array triggers a reduction in precision, we will apply the truncation to all values that follow, but the first four values will have been computed with the original product_scale?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1456967742

> My understanding is this is trying to emulate the behaviour of Spark decimals, which store precision and scale per value. However, I'm not sure this PR actually achieves this, in the event of overflow this PR will truncate the precision of all values in the array, potentially truncating values that wouldn't have been truncated by Spark. I'm therefore not really sure what this PR gains us, it trades a behaviour change that results in an error, to a behaviour change that results in silent truncation? Am I missing something here?

In Spark, decimal expression will be followed by an expression to round decimals. Spark analyzer does type coercion for decimal arithmetic operations and inserts such expression with required precision/scale. To emulate this too, we have a similar expression so that it produces decimals with same precision/scale as Spark.

In Spark, precision/scale is stored in decimal objects. Once truncating happens, it only truncates what needed to be truncated. In the follow up expression, it will be adjusted to required precision/scale if possible.

But as we just have i128 values, we cannot just truncate single value there as it results in inconsistent precision/scale. So it turns out to be current approach that truncates all values to same precision/scale at the end of this kernel. We have a similar expression as mentioned above that adjusts decimals to required precision/scale for the arithmetic operation.

Alternative thing to do for this, I think, is to produce an internal struct of i128, precision and scale during the operation. So we don't need to truncate other values. The kernel also takes required precision/scale as parameters and adjusts decimal values accordingly.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1458771053

   > * DataFusion is updated to follow Spark's type inference for arithmetic on decimals (if it doesn't already)
   > * In particular, multiplication of decimals with precision and scale (p1, s1) and (p2, s2) yields (p1 + p2 + 1, s1 + s2)
   
   I remember DataFusion already has type coercion rule for decimal type that copies the behavior from Spark's one (the posted code link above).
   
   > * By default, an error is returned if the output precision exceeds the maximum precision of the decimal type - as per [Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L54)
   
   This is just the constraint on how a DecimalType is constructed. Obviously it isn't allowed to create a DecimalType over largest precision. It doesn't actually affect decimal arithmetic op.
   
   > * Arithmetic can proceed as normal using the regular unchecked arithmetic kernels, as the precision ensures overflow cannot occur
   
   So this is not correct actually. Spark has an expression to check overflow of decimal arithmetic op result. It checks if the value can fit into required precision/scale. If not, getting a null or exception, depending on config.
   
   > We can then extend this so that if the output precision exceeds the maximum precision of the decimal type, and an allowDecimalPrecisionLoss config option is true, instead of returning a plan error DataFusion can reduce the precision of the result. Looking at the [Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L184) logic for this, it only does this when it will leave at least 6 digits of fractional precision in the result.
   
   `adjustPrecisionScale` is just used by the type coercion rule to adjust precision/scale for decimal arithmetic op. I think DataFusion should already has it as it has copied Spark decimal type coercion rule. In other words, these items are basically analysis phase rule which gets the decimal type of arithmetic op result. But it doesn't guide how the op actually performs. After op, another expression will check overflow. Although Spark updates slightly this process in version 3.4, this process is still the same.
   
   > To facilitate this we can add a `mul_fixed_point` and `mul_fixed_point_checked` to arrow-rs, along with potentially scalar variants, that in addition to their regular inputs, also take a scale. I suspect there is a more efficient way to do this, but at least initially this could just perform an unchecked/checked widening multiplication, divide the result, and then perform an unchecked/checked truncation. Initially we could just support i128, but I think supporting all integral types eventually would be cool.
   
   Let me figure it out. So this sounds similar to this PR but takes a scale parameter. After multiplication, truncating the result according to the required scale?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1136410370


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let product_scale = left.scale() + right.scale();
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)

Review Comment:
   Yea, it's redundant.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1136309865


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let product_scale = left.scale() + right.scale();
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {

Review Comment:
   Perhaps this could fallback to using the regular checked multiply kernel if product_scale == required_scale



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1457056527

   > > So the final result will be adjusted to same precision/scale again
   > 
   > But I thought we just established that the values don't fit into the original precision / scale? I'm sorry but I'm really confused...
   
   Not sure I get it correctly. But the adjustment is to make sure all values have same precision/scale if possible. e.g, for the above one:
   
   ```
   1: {value: 0, scale: 0, precision: 3},
   2: {value: 1, scale: 2, precision: 3},
   3: {value: 100, scale: 2, precision: 3},
   ```
   
   The precision/scale can be inconsistent due to truncating per value. The followup expression will adjust them to same precision/scale (depending on the arithmetic op, decided by Spark type coercion rule). If value cannot be fit into it, it could be null or runtime exception, depending on config.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103858029


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)

Review Comment:
   You mean `i256.checked_mul` is slow?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103905625


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                // Round the value if an exact representation is not possible.
+                // ref: java.math.BigDecimal#doRound
+                let mut digits = a.to_string().len() as i8;

Review Comment:
   `log10` on `i256`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1457043945

   > So the final result will be adjusted to same precision/scale again
   
   But I thought we just established that the values don't fit into the original precision / scale? I'm sorry but I'm really confused...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103859362


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)

Review Comment:
   Yes, there is a TODO to optimise it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103859517


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {

Review Comment:
   Makes sense, I hadn't got to the truncation logic when I wrote this :smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] alamb commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "alamb (via GitHub)" <gi...@apache.org>.

alamb commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1456724276

   > it trades a behaviour change that results in an error, to a behaviour change that results in silent truncation? Am I missing something here?
   
   I think another concern is that this trade comes with a significant performance cost for the non overflow case
   
   What if we changed the DataFusion type coercion code so that it was able to handle the large literal cases 
   
   Like special cased
   https://github.com/apache/arrow-datafusion/blob/99ef989312292b04efdf06b178617e9008b46c84/datafusion/optimizer/src/type_coercion.rs#L589-L598
   
   So that it could handle a literal that was too large for its declared type?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1136311707


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();

Review Comment:
   Is this correct?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1136410537


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let product_scale = left.scale() + right.scale();
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                if required_scale < product_scale {

Review Comment:
   Returning an error for that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1471456352

   Thanks for review. I will add `mul_fixed_point` and scalar variants in other PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] liukun4515 commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "liukun4515 (via GitHub)" <gi...@apache.org>.

liukun4515 commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1430789791

   I want to review this PR in my time tomorrow. @viirya 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1457899035

   Ok, thank you for your patience with me. Would the following therefore be workable and consistent with Spark.
   
   * DataFusion is updated to follow Spark's type inference for arithmetic on decimals (if it doesn't already)
   * In particular, multiplication of decimals with precision and scale (p1, s1) and (p2, s2) yields (p1 + p2 + 1, s1 + s2)
   * By default, an error is returned if the output precision exceeds the maximum precision of the decimal type - as per [Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L54)
   * Arithmetic can proceed as normal using the regular unchecked arithmetic kernels, as the precision ensures overflow cannot occur
   
   We can then extend this so that if the output precision exceeds the maximum precision of the decimal type, and an allowDecimalPrecisionLoss config option is true, instead of returning a plan error DataFusion can reduce the precision of the result. Looking at the [Spark](https://github.com/apache/spark/blob/4b50a46f1c6ba4ffe2c42f70a512879c28d11dcf/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L184) logic for this, it only does this when it will leave at least 6 digits of fractional precision in the result.
   
   To facilitate this we can add a `mul_fixed_point` and `mul_fixed_point_checked` to arrow-rs, along with potentially scalar variants, that in addition to their regular inputs, also take a scale. I suspect there is a more efficient way to do this, but at least initially this could just perform an unchecked/checked widening multiplication, divide the result, and then perform an unchecked/checked truncation. Initially we could just support i128, but I think supporting all integral types eventually would be cool.
   
   I think this would address my concerns with this PR as it would be explicitly opt-in, deterministic, and have adequate performance. What do you think?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1136292121


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let product_scale = left.scale() + right.scale();
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                if required_scale < product_scale {

Review Comment:
   What happens if required_scale is greater than product_scale, should we just assert?



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let product_scale = left.scale() + right.scale();
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)

Review Comment:
   Does this need to be checked, I don't think it can overflow?



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let product_scale = left.scale() + right.scale();
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                if required_scale < product_scale {
+                    let divisor = i256::from_i128(10)

Review Comment:
   I think computing this could be lifted out of the try_unary?



##########
arrow-arith/src/arity.rs:
##########
@@ -77,7 +77,7 @@ pub fn try_unary<I, F, O>(
 where
     I: ArrowPrimitiveType,
     O: ArrowPrimitiveType,
-    F: Fn(I::Native) -> Result<O::Native, ArrowError>,
+    F: FnMut(I::Native) -> Result<O::Native, ArrowError>,

Review Comment:
   Do we still need these FnMut changes?



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1167,77 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let product_scale = left.scale() + right.scale();
+
+    try_binary::<_, _, _, Decimal128Type>(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                if required_scale < product_scale {
+                    let divisor = i256::from_i128(10)
+                        .pow_wrapping((product_scale - required_scale) as u32);
+                    a = divide_and_round::<Decimal256Type>(a, divisor);
+                }
+                a
+            })
+            .ok_or_else(|| {
+                ArrowError::ComputeError(format!(
+                    "Overflow happened on: {:?} * {:?}, {:?}",
+                    a,
+                    b,
+                    a.checked_mul(b)
+                ))
+            })
+            .and_then(|a| {

Review Comment:
   The ? operator may be more legible than chaining and_then



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103865876


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                // Round the value if an exact representation is not possible.
+                // ref: java.math.BigDecimal#doRound
+                let mut digits = a.to_string().len() as i8;
+                let mut diff = digits - (Decimal128Type::MAX_PRECISION as i8);
+
+                while diff > 0 {
+                    let divisor = i256::from_i128(10).pow_wrapping(diff as u32);
+                    a = divide_and_round::<Decimal256Type>(a, divisor);
+                    product_scale -= diff;

Review Comment:
   As the behavior we're going to match is under row-based and object-based, it is somehow different in many aspects regarding being an array kernel. Need to think more about it with a proper fix. 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103851353


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)

Review Comment:
   FWIW this will be exceptionally slow, it currently performs two memory allocations :sweat_smile: 



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {

Review Comment:
   This might be a stupid question, I know very little about decimal arithmetic, but if you are just going to truncate the result why do you need to perform multiplication as the larger type?



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                // Round the value if an exact representation is not possible.
+                // ref: java.math.BigDecimal#doRound
+                let mut digits = a.to_string().len() as i8;

Review Comment:
   `log10` perhaps?



##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                // Round the value if an exact representation is not possible.
+                // ref: java.math.BigDecimal#doRound
+                let mut digits = a.to_string().len() as i8;
+                let mut diff = digits - (Decimal128Type::MAX_PRECISION as i8);
+
+                while diff > 0 {
+                    let divisor = i256::from_i128(10).pow_wrapping(diff as u32);
+                    a = divide_and_round::<Decimal256Type>(a, divisor);
+                    product_scale -= diff;

Review Comment:
   I'm not sure this is correct, as it won't "recompute" the values that have already been evaluated?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103857980


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                // Round the value if an exact representation is not possible.
+                // ref: java.math.BigDecimal#doRound
+                let mut digits = a.to_string().len() as i8;
+                let mut diff = digits - (Decimal128Type::MAX_PRECISION as i8);
+
+                while diff > 0 {
+                    let divisor = i256::from_i128(10).pow_wrapping(diff as u32);
+                    a = divide_and_round::<Decimal256Type>(a, divisor);
+                    product_scale -= diff;

Review Comment:
   Do you mean `a`? It is divided and rounded.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1103864006


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1241,6 +1243,74 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result. In the case, the result will be rounded.
+pub fn multiply_decimal(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+) -> Result<ArrayRef, ArrowError> {
+    let precision = left.precision();
+    let mut product_scale = left.scale() + right.scale();
+
+    math_checked_op(left, right, |a, b| {
+        let a = i256::from_i128(a);
+        let b = i256::from_i128(b);
+
+        a.checked_mul(b)
+            .map(|mut a| {
+                // Round the value if an exact representation is not possible.
+                // ref: java.math.BigDecimal#doRound
+                let mut digits = a.to_string().len() as i8;
+                let mut diff = digits - (Decimal128Type::MAX_PRECISION as i8);
+
+                while diff > 0 {
+                    let divisor = i256::from_i128(10).pow_wrapping(diff as u32);
+                    a = divide_and_round::<Decimal256Type>(a, divisor);
+                    product_scale -= diff;

Review Comment:
   Aah, I think you're correct. I forgot that in JVM that each value is a decimal object which keeps precision/scale by itself. So it allows each value has different precision/scale after the computation. I need to rethink this for a fix.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya merged pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya merged PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1456992755

   So if I am following correctly, take the example of an array containing `[1, 10, 100]` multiplied by itself, with scale 2 and precision 3
   
   In spark this will result in
   
   ```
   1: {value: 1, scale: 2, precision: 3},
   2: {value: 100, scale: 2, precision: 3},
   3: {value: 100, scale: 4, precision: 3},
   ```
   
   With this PR this will result in a new array with scale 4 and precision 3 containing
   
   ```
   1: 0,
   2: 1,
   3: 100
   ```
   
   I don't follow how this is comparable behaviour?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] tustvold commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "tustvold (via GitHub)" <gi...@apache.org>.

tustvold commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1457064126

   > The followup expression will adjust them to same precision/scale
   
   Could you expand on this, how does it decide the output precision/scale to unify to? Is it the same as the input? Is it data dependent? Unless I am mistaken in arrow the precision and scale are an immutable part of the column's schema that must be known at plan time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1457670060

   > Could you expand on this, how does it decide the output precision/scale to unify to? Is it the same as the input? Is it data dependent? Unless I am mistaken in arrow the precision and scale are an immutable part of the column's schema that must be established at plan time.
   
   Spark type coercion rule decides the output precision/scale: e.g., https://github.com/apache/spark/blob/branch-3.2/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DecimalPrecision.scala#L126-L136. It is dependent on input precision/scale. Yea, it is established at plan (query analysis) time.
   
   > I wonder if the issue is that DataFusion is not inspecting the precision in order to work out if there is a potential for overflow, and scaling the input as appropriate based on this information? This is what I had understood the whole purpose of the precision argument as being for?
   
   I think it is not possible to inspect potential overflow for input. Because overflowing or not isn't just dependent on precision/scale only but the actual input data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#issuecomment-1427117474

   > This might be a dumb question, but can we not do something with precision statically instead of per-value. i.e. if multiplying two decimal numbers with precision of 10 and scale 5, the output should have precision of 20 and scale 10. The plan can then intentionally truncate inputs using division if necessary?
   
   Hmm, I think one reason is that we are not sure if truncation will actually happen before multiplying values. If not, early truncating inputs could loss precision which can be avoided? 
   
   > This might be another dumb thought, but if you want truncating arithmetic, why not just cast to a floating point representation? It'll be faster and better supported? I guess I had thought the only reason to still bother with decimal precision these days was if you couldn't tolerate approximation?
   
   A major reason for this work is to provide compatibility to existing queries. If our users happened to use decimal for whatever reasons and they just rely on such truncation behavior, they may not tolerate a different behavior like failing their queries. Besides, as truncating is not always happened but just a possibility, they might tolerate approximation for it for rare case (if happened) but for all values (this is what I thought, not voice from real users).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1136390514


##########
arrow-arith/src/arity.rs:
##########
@@ -77,7 +77,7 @@ pub fn try_unary<I, F, O>(
 where
     I: ArrowPrimitiveType,
     O: ArrowPrimitiveType,
-    F: Fn(I::Native) -> Result<O::Native, ArrowError>,
+    F: FnMut(I::Native) -> Result<O::Native, ArrowError>,

Review Comment:
   Reverted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3690: Allow precision loss on multiplying decimal arrays

Posted by "viirya (via GitHub)" <gi...@apache.org>.

viirya commented on code in PR #3690:
URL: https://github.com/apache/arrow-rs/pull/3690#discussion_r1137570366


##########
arrow-arith/src/arithmetic.rs:
##########
@@ -1165,6 +1168,78 @@ pub fn multiply_dyn_checked(
     }
 }
 
+/// Perform `left * right` operation on two decimal arrays. If either left or right value is
+/// null then the result is also null.
+///
+/// This performs decimal multiplication which allows precision loss if an exact representation
+/// is not possible for the result, according to the required scale. In the case, the result
+/// will be rounded to the required scale.
+///
+/// If the required scale is greater than the product scale, an error is returned.
+///
+/// It is implemented for compatibility with precision loss `multiply` function provided by
+/// other data processing engines. For multiplication with precision loss detection, use
+/// `multiply` or `multiply_checked` instead.
+pub fn mul_fixed_point_checked(
+    left: &PrimitiveArray<Decimal128Type>,
+    right: &PrimitiveArray<Decimal128Type>,
+    required_scale: i8,
+) -> Result<PrimitiveArray<Decimal128Type>, ArrowError> {
+    let product_scale = left.scale() + right.scale();
+
+    if required_scale == product_scale {
+        return multiply_checked(left, right);

Review Comment:
   There is a test for `required_scale == product_scale` case. I just need to remove `with_precision_and_scale` in the test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org