You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/30 12:21:18 UTC

[GitHub] [arrow] sweb opened a new pull request #9047: Arrow 11072: [Rust] [Parquet] Support reading decimal from physical int types

sweb opened a new pull request #9047:
URL: https://github.com/apache/arrow/pull/9047


   This PR adds capabilities to read decimal columns in parquet files that store them as i32 or i64.
   
   I tried to follow the approach in #8926 by using casts. However, there is an issue with my solution that I expect to be problematic:
   
   Casting from i32/i64 to decimal assumes that the integer value was intended to represent a decimal value, since it just transforms the i32/i64 to i128. This will lead to usability issues once someone tries to cast an actual `Int64Array` to `DecimalArray`, since this will most likely not return the expected outcome (I would expect that value to be multiplied by 10^scale). I am currently not sure how to resolve this while still using casts to get from i32/i64 to decimal. Maybe it makes sense to remove the casting operations and put the logic in `PrimitiveArrayReader::next_batch`.
   
   Let me know what you think!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550821639



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,18 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> DataType {
+        assert!(self.schema.is_primitive());
+        if let Type::PrimitiveType {
+            precision, scale, ..
+        } = self.schema
+        {
+            DataType::Decimal(*precision as usize, *scale as usize)
+        } else {

Review comment:
       oops my bad - this looks good then. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nevi-me commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
nevi-me commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550964408



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       @sweb what does downcasting the array straignt to an i64 result in? Does the below panic?
   
   ```rust
   let mut builder = DecimalBuilder::new(array.len(), *p, *s);
   let values = array.as_any().downcast_ref::<Int64Array>().unwrap();
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550882210



##########
File path: rust/parquet/src/schema/types.rs
##########
@@ -103,6 +103,24 @@ impl Type {
         }
     }
 
+    /// Gets precision of this primitive type.
+    /// Note that this will panic if called on a non-primitive type.
+    pub fn get_precision(&self) -> i32 {
+        match *self {
+            Type::PrimitiveType { precision, ..} => precision,
+            _ => panic!("Cannot call get_precision() on non-primitive type")
+        }
+    }
+
+    /// Gets scale of this primitive type.
+    /// Note that this will panic if called on a non-primitive type.
+    pub fn get_scale(&self) -> i32 {
+        match *self {
+            Type::PrimitiveType { scale, ..} => scale,
+            _ => panic!("Cannot call get_scale() on non-primitive type")

Review comment:
       I added one more commit, using the new getters in `array_reader` to simplify the decimal converter construction.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550878051



##########
File path: rust/parquet/src/schema/types.rs
##########
@@ -103,6 +103,24 @@ impl Type {
         }
     }
 
+    /// Gets precision of this primitive type.
+    /// Note that this will panic if called on a non-primitive type.
+    pub fn get_precision(&self) -> i32 {
+        match *self {
+            Type::PrimitiveType { precision, ..} => precision,
+            _ => panic!("Cannot call get_precision() on non-primitive type")
+        }
+    }
+
+    /// Gets scale of this primitive type.
+    /// Note that this will panic if called on a non-primitive type.
+    pub fn get_scale(&self) -> i32 {
+        match *self {
+            Type::PrimitiveType { scale, ..} => scale,
+            _ => panic!("Cannot call get_scale() on non-primitive type")

Review comment:
       this is much nicer (and it is now clearer what is expected) 👍 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550798792



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       Hmm, should this be handled in `build_for_primitive_type_inner` inside `array_reader.rs`? where we construct a `ComplexObjectArrayReader` with the right converter for decimal types. I see we currently handle the case for `FIXED_LEN_BYTE_ARRAY` but not for `INT32`/`INT64`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550821867



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,18 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> DataType {
+        assert!(self.schema.is_primitive());
+        if let Type::PrimitiveType {
+            precision, scale, ..
+        } = self.schema
+        {
+            DataType::Decimal(*precision as usize, *scale as usize)
+        } else {

Review comment:
       I think it may make sense to add `get_scale` and `get_precision` to `Type` as well, similar to other getters there, but it's just a good to have :)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550321322



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,22 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> Result<DataType> {
+        let (precision, scale) = match self.schema {
+            Type::PrimitiveType {
+                ref precision,
+                ref scale,
+                ..
+            } => (*precision, *scale),
+            _ => {
+                return Err(ArrowError(

Review comment:
       I think you can use assertion here since this is only supposed to be called on primitive types. And it's more concise to do something like this:
   
   ```rust
           match self.schema {
               Type::PrimitiveType {
                   ref precision,
                   ref scale,
                   ..
               } => Ok(DataType::Decimal(*precision as usize, *scale as usize)),
               _ => Err(ArrowError(
                   "Expected a physical type, not a group type".to_string(),
               )),
           }
   ```

##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -532,6 +533,7 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
         (Int64, Int32) => cast_numeric_arrays::<Int64Type, Int32Type>(array),
         (Int64, Float32) => cast_numeric_arrays::<Int64Type, Float32Type>(array),
         (Int64, Float64) => cast_numeric_arrays::<Int64Type, Float64Type>(array),
+        (Int64, Decimal(p, s)) => int64_to_decimal_cast(array, *p, *s),

Review comment:
       hmm... is this related to the parquet-side change?

##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       should this be handled somehow in the parquet/arrow reader? 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550881168



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       I added an implementation using converters (I have not pushed it yet), but the creation of `ParquetTypeConverter` occurs before we even get to `build_for_primitive_type_inner`:
   
   ```
   ParquetFileArrowReader::get_record_reader ->
   ParquetFileArrowReader::get_record_reader_by_columns -> 
   ParquetFileArrowReader::get_schema ->
   schema::parquet_to_arrow_schema ->
   schema::parquet_to_arrow_schema_by_columns ->
   ParquetTypeConverter::new
   ```
   Thus, if we want to avoid the decimal case, a decimal specific implementation in `get_schema` would probably be required.
   
   The converters could avoid the decimal specific branch in PrimitiveArrayReader::next_batch:
   
   ```
   ArrowType::Decimal(p, s) => {
                   let to_int64 = arrow::compute::cast(&array, &ArrowType::Int64)?;
                   let mut builder = DecimalBuilder::new(to_int64.len(), *p, *s);
                   let values = to_int64.as_any().downcast_ref::<Int64Array>().unwrap();
                   for maybe_value in values.iter() {
                       match maybe_value {
                           Some(value) => builder.append_value(value as i128)?,
                           None => builder.append_null()?
                       }
                   }
                   Arc::new(builder.finish()) as ArrayRef
               }
   ```
   
   We could avoid the cast to i64 by adding converters for i32/i64 for but I think this is not such a big deal - what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550806533



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,18 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> DataType {
+        assert!(self.schema.is_primitive());
+        if let Type::PrimitiveType {
+            precision, scale, ..
+        } = self.schema
+        {
+            DataType::Decimal(*precision as usize, *scale as usize)
+        } else {

Review comment:
       Sorry, I think I am not familiar enough with Rust yet. I was under the impression that I have to provide an implementation for all possible branches and I get a compiler error when I remove the else branch. Is it possible to unwrap the if part somehow?
   
   ```
   error[E0317]: `if` may be missing an `else` clause
      --> parquet/src/arrow/schema.rs:650:9
       |
   650 | /         if let Type::PrimitiveType {
   651 | |             precision, scale, ..
   652 | |         } = self.schema
   653 | |         {
   654 | |             DataType::Decimal(*precision as usize, *scale as usize)
       | |             ------------------------------------------------------- found here
   655 | |         } 
       | |_________^ expected `()`, found enum `arrow::datatypes::DataType`
       |
       = note: `if` expressions without `else` evaluate to `()`
       = help: consider adding an `else` block that evaluates to the expected type
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r551540021



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       Thanks for the update. This PR looks good and we can address the remaining in a followup.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-753307820


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=h1) Report
   > Merging [#9047](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=desc) (7ee5fe6) into [master](https://codecov.io/gh/apache/arrow/commit/cc0ee5efcf9f6a67bcc407a11e8553a0409275c1?el=desc) (cc0ee5e) will **increase** coverage by `0.07%`.
   > The diff coverage is `80.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9047/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9047      +/-   ##
   ==========================================
   + Coverage   82.55%   82.62%   +0.07%     
   ==========================================
     Files         203      203              
     Lines       50043    50153     +110     
   ==========================================
   + Hits        41313    41439     +126     
   + Misses       8730     8714      -16     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `91.52% <50.00%> (+0.89%)` | :arrow_up: |
   | [rust/parquet/src/schema/types.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvdHlwZXMucnM=) | `89.52% <50.00%> (-0.42%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.77% <91.66%> (+0.42%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.63% <100.00%> (+0.02%)` | :arrow_up: |
   | [rust/parquet/src/arrow/arrow\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd19yZWFkZXIucnM=) | `91.57% <100.00%> (+0.18%)` | :arrow_up: |
   | [...datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvaGFzaF9idWlsZF9wcm9iZV9vcmRlci5ycw==) | `54.73% <0.00%> (-3.70%)` | :arrow_down: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `56.17% <0.00%> (-2.90%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2dyb3VwX3NjYWxhci5ycw==) | `67.85% <0.00%> (-2.52%)` | :arrow_down: |
   | [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `92.60% <0.00%> (-1.66%)` | :arrow_down: |
   | [rust/arrow/src/csv/reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3JlYWRlci5ycw==) | `93.15% <0.00%> (-1.34%)` | :arrow_down: |
   | ... and [14 more](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=footer). Last update [cc0ee5e...7ee5fe6](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-752956304


   The full set of Rust CI tests did not run on this PR :(
   
   Can you please rebase this PR against [apache/master](https://github.com/apache/arrow) to pick up the changes in https://github.com/apache/arrow/pull/9056 so that they do? 
   
   I apologize for the inconvenience. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550881168



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       I added an implementation using converters, but the creation of `ParquetTypeConverter` occurs before we even get to `build_for_primitive_type_inner`:
   
   ```
   ParquetFileArrowReader::get_record_reader ->
   ParquetFileArrowReader::get_record_reader_by_columns -> 
   ParquetFileArrowReader::get_schema ->
   schema::parquet_to_arrow_schema ->
   schema::parquet_to_arrow_schema_by_columns ->
   ParquetTypeConverter::new
   ```
   Thus, if we want to avoid the decimal case, a decimal specific implementation in `get_schema` would probably be required.
   
   The converters could avoid the decimal specific branch in PrimitiveArrayReader::next_batch:
   
   ```
   ArrowType::Decimal(p, s) => {
                   let to_int64 = arrow::compute::cast(&array, &ArrowType::Int64)?;
                   let mut builder = DecimalBuilder::new(to_int64.len(), *p, *s);
                   let values = to_int64.as_any().downcast_ref::<Int64Array>().unwrap();
                   for maybe_value in values.iter() {
                       match maybe_value {
                           Some(value) => builder.append_value(value as i128)?,
                           None => builder.append_null()?
                       }
                   }
                   Arc::new(builder.finish()) as ArrayRef
               }
   ```
   
   We could avoid the cast to i64 by adding converters for i32/i64 for but I think this is not such a big deal - what do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9047: Arrow 11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-752563024


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #9047: Arrow 11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-752615117


   ARROW-${JIRA_ID} :)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-753307820


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=h1) Report
   > Merging [#9047](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=desc) (6a45815) into [master](https://codecov.io/gh/apache/arrow/commit/cc0ee5efcf9f6a67bcc407a11e8553a0409275c1?el=desc) (cc0ee5e) will **increase** coverage by `0.00%`.
   > The diff coverage is `94.44%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9047/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master    #9047   +/-   ##
   =======================================
     Coverage   82.55%   82.56%           
   =======================================
     Files         203      203           
     Lines       50043    50058   +15     
   =======================================
   + Hits        41313    41330   +17     
   + Misses       8730     8728    -2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.46% <88.88%> (+0.12%)` | :arrow_up: |
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `91.06% <88.88%> (+0.44%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.69% <100.00%> (+0.08%)` | :arrow_up: |
   | [rust/parquet/src/arrow/arrow\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd19yZWFkZXIucnM=) | `91.57% <100.00%> (+0.18%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=footer). Last update [cc0ee5e...6a45815](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-753307820


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=h1) Report
   > Merging [#9047](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=desc) (5e8645c) into [master](https://codecov.io/gh/apache/arrow/commit/cc0ee5efcf9f6a67bcc407a11e8553a0409275c1?el=desc) (cc0ee5e) will **increase** coverage by `0.06%`.
   > The diff coverage is `78.57%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9047/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9047      +/-   ##
   ==========================================
   + Coverage   82.55%   82.61%   +0.06%     
   ==========================================
     Files         203      203              
     Lines       50043    50161     +118     
   ==========================================
   + Hits        41313    41441     +128     
   + Misses       8730     8720      -10     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `91.52% <50.00%> (+0.89%)` | :arrow_up: |
   | [rust/parquet/src/schema/types.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvdHlwZXMucnM=) | `89.52% <50.00%> (-0.42%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.46% <88.88%> (+0.12%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.63% <100.00%> (+0.02%)` | :arrow_up: |
   | [rust/parquet/src/arrow/arrow\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd19yZWFkZXIucnM=) | `91.57% <100.00%> (+0.18%)` | :arrow_up: |
   | [...datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvaGFzaF9idWlsZF9wcm9iZV9vcmRlci5ycw==) | `54.73% <0.00%> (-3.70%)` | :arrow_down: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `56.17% <0.00%> (-2.90%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2dyb3VwX3NjYWxhci5ycw==) | `67.85% <0.00%> (-2.52%)` | :arrow_down: |
   | [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `92.60% <0.00%> (-1.66%)` | :arrow_down: |
   | [rust/arrow/src/csv/reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY3N2L3JlYWRlci5ycw==) | `93.15% <0.00%> (-1.34%)` | :arrow_down: |
   | ... and [15 more](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=footer). Last update [cc0ee5e...7ee5fe6](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550877957



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,18 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> DataType {
+        assert!(self.schema.is_primitive());
+        if let Type::PrimitiveType {
+            precision, scale, ..
+        } = self.schema
+        {
+            DataType::Decimal(*precision as usize, *scale as usize)
+        } else {

Review comment:
       Great point! I added both methods which cleans up `to_decimal` quite a bit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r551681563



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       np @sweb - thanks for the contribution!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550905767



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       Got it. This looks good but why we need to do another cast on the arrow array though? it could hurt performance. Is it possible to directly build decimal array from the input parquet data?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550468985



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -532,6 +533,7 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
         (Int64, Int32) => cast_numeric_arrays::<Int64Type, Int32Type>(array),
         (Int64, Float32) => cast_numeric_arrays::<Int64Type, Float32Type>(array),
         (Int64, Float64) => cast_numeric_arrays::<Int64Type, Float64Type>(array),
+        (Int64, Decimal(p, s)) => int64_to_decimal_cast(array, *p, *s),

Review comment:
       I think you stumbled upon the issue I tried to explain in the PR description - I am now sure that it was a bad idea to use casts in this way :) 
   
   I removed the cast operations and replaced it with some logic in `PrimitiveArrayReader::next_batch`.

##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       I am having a hard time finding the right place to construct the decimal type. My main concern with the changes I made is that Decimal is the only type that needs additional info from the schema in order to define precision and scale - all other types can be built directly from the logical type. However, if I put it in `PrimitiveArrayReader::new` I am only able to accomplish this by reimplementing parts of the ParquetTypeConverter specifically for Decimal. Do you have a suggestion how I can handle this in a better way?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550799082



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -532,6 +533,7 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
         (Int64, Int32) => cast_numeric_arrays::<Int64Type, Int32Type>(array),
         (Int64, Float32) => cast_numeric_arrays::<Int64Type, Float32Type>(array),
         (Int64, Float64) => cast_numeric_arrays::<Int64Type, Float64Type>(array),
+        (Int64, Decimal(p, s)) => int64_to_decimal_cast(array, *p, *s),

Review comment:
       yeah this looks a little weird - I'm thinking that instead of handling this in the compute kernel, the parquet/arrow reader should just understand the conversion from parquet's decimal type to that of arrow's and generate decimal arrow directly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r551575727



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       @sunchao thank you for your help with this PR. The result is much better than my initial proposal!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550975265



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -591,6 +591,7 @@ impl ParquetTypeConverter<'_> {
             LogicalType::INT_32 => Ok(DataType::Int32),
             LogicalType::DATE => Ok(DataType::Date32(DateUnit::Day)),
             LogicalType::TIME_MILLIS => Ok(DataType::Time32(TimeUnit::Millisecond)),
+            LogicalType::DECIMAL => self.to_decimal(),

Review comment:
       The cast is necessary, because downcasting an i32 array to i64 results in None, panicking as a result. I have removed the cast and handle the building of the decimal array now separately, but I do not really like my solution:
   
   ```
                   let mut builder = DecimalBuilder::new(array.len(), *p, *s);
                   match array.data_type() {
                       ArrowType::Int32 => {
                           let values = array.as_any().downcast_ref::<Int32Array>().unwrap();
                           for maybe_value in values.iter() {
                               match maybe_value {
                                   Some(value) => builder.append_value(value as i128)?,
                                   None => builder.append_null()?,
                               }
                           }
                       }
                       ArrowType::Int64 => {
                           let values = array.as_any().downcast_ref::<Int64Array>().unwrap();
                           for maybe_value in values.iter() {
                               match maybe_value {
                                   Some(value) => builder.append_value(value as i128)?,
                                   None => builder.append_null()?,
                               }
                           }
                       }
                       _ => {
                           return Err(ArrowError(format!(
                               "Cannot convert {:?} to decimal",
                               array.data_type()
                           )))
                       }
                   }
   ```
   
   At the moment, I am not able to convince the compiler to treat downcasts to i32 and i64 as the same primitive array and then use `as` on the resulting iterator to convert from types that are unknown at compile time to `i128`. I will probably need to spend some time figuring this out.
   
   Alternatively, we could use converters, similar to the fixed length case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9047:
URL: https://github.com/apache/arrow/pull/9047


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-753307820


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=h1) Report
   > Merging [#9047](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=desc) (6a63088) into [master](https://codecov.io/gh/apache/arrow/commit/cc0ee5efcf9f6a67bcc407a11e8553a0409275c1?el=desc) (cc0ee5e) will **increase** coverage by `0.06%`.
   > The diff coverage is `77.50%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9047/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9047      +/-   ##
   ==========================================
   + Coverage   82.55%   82.61%   +0.06%     
   ==========================================
     Files         203      203              
     Lines       50043    49966      -77     
   ==========================================
   - Hits        41313    41280      -33     
   + Misses       8730     8686      -44     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `91.50% <33.33%> (+0.88%)` | :arrow_up: |
   | [rust/parquet/src/schema/types.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvdHlwZXMucnM=) | `89.52% <50.00%> (-0.42%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.46% <88.88%> (+0.12%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.63% <100.00%> (+0.02%)` | :arrow_up: |
   | [rust/parquet/src/arrow/arrow\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd19yZWFkZXIucnM=) | `91.57% <100.00%> (+0.18%)` | :arrow_up: |
   | [...datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvaGFzaF9idWlsZF9wcm9iZV9vcmRlci5ycw==) | `54.73% <0.00%> (-3.70%)` | :arrow_down: |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `97.72% <0.00%> (-0.49%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_string.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfc3RyaW5nLnJz) | `90.16% <0.00%> (-0.11%)` | :arrow_down: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `58.99% <0.00%> (-0.08%)` | :arrow_down: |
   | ... and [4 more](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=footer). Last update [cc0ee5e...6a63088](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-753307820


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=h1) Report
   > Merging [#9047](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=desc) (ba1c6aa) into [master](https://codecov.io/gh/apache/arrow/commit/cc0ee5efcf9f6a67bcc407a11e8553a0409275c1?el=desc) (cc0ee5e) will **increase** coverage by `0.05%`.
   > The diff coverage is `73.21%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9047/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9047      +/-   ##
   ==========================================
   + Coverage   82.55%   82.61%   +0.05%     
   ==========================================
     Files         203      204       +1     
     Lines       50043    50213     +170     
   ==========================================
   + Hits        41313    41481     +168     
   - Misses       8730     8732       +2     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/parquet/src/arrow/schema.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9zY2hlbWEucnM=) | `91.52% <50.00%> (+0.89%)` | :arrow_up: |
   | [rust/parquet/src/schema/types.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9zY2hlbWEvdHlwZXMucnM=) | `89.52% <50.00%> (-0.42%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `75.44% <69.56%> (+0.09%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.63% <100.00%> (+0.02%)` | :arrow_up: |
   | [rust/parquet/src/arrow/arrow\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd19yZWFkZXIucnM=) | `91.57% <100.00%> (+0.18%)` | :arrow_up: |
   | [...datafusion/src/optimizer/hash\_build\_probe\_order.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvaGFzaF9idWlsZF9wcm9iZV9vcmRlci5ycw==) | `54.73% <0.00%> (-3.70%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/hash\_join.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfam9pbi5ycw==) | `86.32% <0.00%> (-3.62%)` | :arrow_down: |
   | [rust/datafusion/src/scalar.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9zY2FsYXIucnM=) | `56.17% <0.00%> (-2.90%)` | :arrow_down: |
   | [rust/datafusion/src/physical\_plan/group\_scalar.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2dyb3VwX3NjYWxhci5ycw==) | `67.85% <0.00%> (-2.52%)` | :arrow_down: |
   | [...tafusion/src/physical\_plan/datetime\_expressions.rs](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2RhdGV0aW1lX2V4cHJlc3Npb25zLnJz) | `92.60% <0.00%> (-1.66%)` | :arrow_down: |
   | ... and [19 more](https://codecov.io/gh/apache/arrow/pull/9047/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=footer). Last update [cc0ee5e...ba1c6aa](https://codecov.io/gh/apache/arrow/pull/9047?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550798940



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,22 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> Result<DataType> {
+        let (precision, scale) = match self.schema {
+            Type::PrimitiveType {
+                ref precision,
+                ref scale,
+                ..
+            } => (*precision, *scale),
+            _ => {
+                return Err(ArrowError(

Review comment:
       Thanks. Looking good! the `else` branch in the `if let` block seems unnecessary?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] removed a comment on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
github-actions[bot] removed a comment on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-752563024


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550799126



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,18 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> DataType {
+        assert!(self.schema.is_primitive());
+        if let Type::PrimitiveType {
+            precision, scale, ..
+        } = self.schema
+        {
+            DataType::Decimal(*precision as usize, *scale as usize)
+        } else {

Review comment:
       can we remove this branch?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550799082



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -532,6 +533,7 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
         (Int64, Int32) => cast_numeric_arrays::<Int64Type, Int32Type>(array),
         (Int64, Float32) => cast_numeric_arrays::<Int64Type, Float32Type>(array),
         (Int64, Float64) => cast_numeric_arrays::<Int64Type, Float64Type>(array),
+        (Int64, Decimal(p, s)) => int64_to_decimal_cast(array, *p, *s),

Review comment:
       yeah this looks a little weird - I'm thinking that instead of handling this in the compute kernel, the parquet/arrow reader should just understand the conversion from parquet's decimal type to that of arrow's and generate decimal arrow arrays directly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550760559



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,22 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> Result<DataType> {
+        let (precision, scale) = match self.schema {
+            Type::PrimitiveType {
+                ref precision,
+                ref scale,
+                ..
+            } => (*precision, *scale),
+            _ => {
+                return Err(ArrowError(

Review comment:
       Thank you very much! I changed it to use an assertion. Let me know if you prefer this variant - there are other methods in `schema.rs` that follow a similar pattern and I could also change them accordingly while I am already here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#issuecomment-752645838


   https://issues.apache.org/jira/browse/ARROW-11072


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550576451



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,22 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> Result<DataType> {
+        let (precision, scale) = match self.schema {
+            Type::PrimitiveType {
+                ref precision,
+                ref scale,
+                ..
+            } => (*precision, *scale),
+            _ => {
+                return Err(ArrowError(

Review comment:
       I'm thinking something like:
   ```rust
   assert!(self.schema.is_primitive());
   if let ... {
     ...
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sweb commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sweb commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550468948



##########
File path: rust/parquet/src/arrow/schema.rs
##########
@@ -657,6 +645,22 @@ impl ParquetTypeConverter<'_> {
         }
     }
 
+    fn to_decimal(&self) -> Result<DataType> {
+        let (precision, scale) = match self.schema {
+            Type::PrimitiveType {
+                ref precision,
+                ref scale,
+                ..
+            } => (*precision, *scale),
+            _ => {
+                return Err(ArrowError(

Review comment:
       @sunchao thank you for your review comments! I changed this part as suggested. Do you have an example of an assertion checking for a type? The only thing I could come up with (still quite new to Rust) is something along the lines of
   
   ```
   if let Type::PrimitiveType {ref precision, ref scale, .. } = self.schema {
     DataType::Decimal(*precision as usize, *scale as usize)
   } else {
     panic!("Expected a physical type, not a group type")
   }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] sunchao commented on a change in pull request #9047: ARROW-11072: [Rust] [Parquet] Support reading decimal from physical int types

Posted by GitBox <gi...@apache.org>.
sunchao commented on a change in pull request #9047:
URL: https://github.com/apache/arrow/pull/9047#discussion_r550798904



##########
File path: rust/arrow/src/compute/kernels/cast.rs
##########
@@ -532,6 +533,7 @@ pub fn cast(array: &ArrayRef, to_type: &DataType) -> Result<ArrayRef> {
         (Int64, Int32) => cast_numeric_arrays::<Int64Type, Int32Type>(array),
         (Int64, Float32) => cast_numeric_arrays::<Int64Type, Float32Type>(array),
         (Int64, Float64) => cast_numeric_arrays::<Int64Type, Float64Type>(array),
+        (Int64, Decimal(p, s)) => int64_to_decimal_cast(array, *p, *s),

Review comment:
       Thanks. Looking good! the `else` branch of the `if let` block seems unnecessary?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org