You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/03 15:57:42 UTC

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #1394: support decimal scalar value

alamb commented on a change in pull request #1394:
URL: https://github.com/apache/arrow-datafusion/pull/1394#discussion_r762037532



##########
File path: datafusion/src/scalar.rs
##########
@@ -453,6 +477,26 @@ macro_rules! eq_array_primitive {
 }
 
 impl ScalarValue {
+    /// Create a decimal Scalar from value/precision and scale.
+    pub fn try_new_decimal128(
+        value: i128,
+        precision: usize,
+        scale: usize,
+    ) -> Result<Self> {
+        // make sure the precision and scale is valid
+        // TODO const the max precision and min scale

Review comment:
       The arrow spec doesn't seem to say anything about min/max valid scales: https://github.com/apache/arrow/blob/master/format/Schema.fbs#L181-L190
   
   I poked a little around in arrow-rs and I also didn't find any limits on the precision or scale either

##########
File path: datafusion/src/scalar.rs
##########
@@ -171,6 +179,17 @@ impl PartialOrd for ScalarValue {
         // any newly added enum variant will require editing this list
         // or else face a compile error
         match (self, other) {
+            // TODO decimal type, we just compare the values which have the same precision and scale.

Review comment:
       I think the code in this PR is correct  -- two decimal's can be compared if they have the same precision / scale and they can't be compared (returns `None`) if they have different precision/scales). 

##########
File path: datafusion/src/scalar.rs
##########
@@ -100,6 +102,12 @@ impl PartialEq for ScalarValue {
         // any newly added enum variant will require editing this list
         // or else face a compile error
         match (self, other) {
+            (Decimal128(v1, p1, s1), Decimal128(v2, p2, s2)) => {
+                v1.eq(v2) && p1.eq(p2) && s1.eq(s2)
+                // TODO how to handle this case: decimal(123,10,1) with decimal(1230,10,2)

Review comment:
        `Int8(Some(100))` is not equal to `Int64(Some(100))` with this implementation either , so I think it is consistent with the rest of this comparison of `decimal(123,10,1) and `decimal(1230,10,2)` are different --
   
   

##########
File path: datafusion/src/scalar.rs
##########
@@ -43,6 +43,8 @@ pub enum ScalarValue {
     Float32(Option<f32>),
     /// 64bit float
     Float64(Option<f64>),
+    /// 128bit decimal, using the i128 to represent the decimal
+    Decimal128(Option<i128>, Option<usize>, Option<usize>),

Review comment:
       🤔  it seems like we always need the precision and scale (to know what type the Decimal is). 
   
   Thus, perhaps this could be changed to be
   
   ```suggestion
       Decimal128(Option<i128>, usize, usize),
   ```

##########
File path: datafusion/src/scalar.rs
##########
@@ -465,6 +509,14 @@ impl ScalarValue {
             ScalarValue::Int16(_) => DataType::Int16,
             ScalarValue::Int32(_) => DataType::Int32,
             ScalarValue::Int64(_) => DataType::Int64,
+            ScalarValue::Decimal128(_, Some(precision), Some(scale)) => {
+                DataType::Decimal(*precision, *scale)
+            }
+            ScalarValue::Decimal128(_, _, _) => {
+                // TODO add the default precision and scale for this case
+                // DataType::Decimal(38, 0)
+                panic!("The Decimal Scalar value with invalid precision or scale.");

Review comment:
       I think if you changed Decimal as suggested above to always have precision and scale, this code would not be possible

##########
File path: datafusion/src/scalar.rs
##########
@@ -1061,19 +1176,41 @@ impl ScalarValue {
                     Arc::new(StructArray::from(field_values))
                 }
                 None => {
-                    let field_values: Vec<_> = fields.iter().map(|field| {
+                    let field_values: Vec<_> = fields
+                        .iter()
+                        .map(|field| {
                             let none_field = Self::try_from(field.data_type()).expect(
                                 "Failed to construct null ScalarValue from Struct field type"
                             );
                             (field.clone(), none_field.to_array_of_size(size))
-                        }).collect();
+                        })
+                        .collect();
 
                     Arc::new(StructArray::from(field_values))
                 }
             },
         }
     }
 
+    fn get_decimal_value_from_array(
+        array: &ArrayRef,
+        index: usize,
+        precision: &usize,
+        scale: &usize,
+    ) -> ScalarValue {
+        let array = array.as_any().downcast_ref::<DecimalArray>().unwrap();
+        // TODO add checker: the precision and scale are same with array

Review comment:
       this would be a good `assert!` type check to add, though I don't think it would ever happen unless there is some bug

##########
File path: datafusion/src/scalar.rs
##########
@@ -831,13 +892,40 @@ impl ScalarValue {
                     "Unsupported creation of {:?} array from ScalarValue {:?}",
                     data_type,
                     scalars.peek()
-                )))
+                )));
             }
         };
 
         Ok(array)
     }
 
+    fn iter_to_decimal_array(
+        scalars: impl IntoIterator<Item = ScalarValue>,
+        precision: &usize,
+        scale: &usize,
+    ) -> Result<DecimalArray> {
+        // collect the value as Option<i128>
+        let array = scalars
+            .into_iter()
+            .map(|element: ScalarValue| match element {
+                ScalarValue::Decimal128(v1, _, _) => v1,
+                _ => unreachable!(),
+            })
+            .collect::<Vec<Option<i128>>>();

Review comment:
       Using `DecimalBuilder` in this way is awkward. I wonder if we could add a function to create Decimal values from an array of `i128` (likely it would go in arrow-rs). Something like
   
   ```
   let arr = DecimalArray::from_iter_and_scale(array.into_iter(), precision, scale)
   ```

##########
File path: datafusion/src/scalar.rs
##########
@@ -253,6 +272,11 @@ impl std::hash::Hash for ScalarValue {
     fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
         use ScalarValue::*;
         match self {
+            Decimal128(v, p, s) => {
+                v.hash(state);
+                p.hash(state);
+                s.hash(state)

Review comment:
       I think we could probably skip hashing the precision and scale - and just hash the value




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org