You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/06 10:53:17 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

jorgecarleitao opened a new pull request #8853:
URL: https://github.com/apache/arrow/pull/8853


   This PR:
   
   * extends the types that `concat` support for all types that `MutableArrayData` supports (i.e. it now supports nested Lists, all primitives, boolean, string and large string, etc.)
   * makes `concat` 6x faster for primitive types and 2x faster for string types (and likely also for the other types)
   * changes `concat`'s signature to `&[&Array]` instead of `&[Arc<Array>]`, to avoid an `Arc::clone`.
   
   Since `XBuilder::append_data` was specifically built for this kernel but is not used, and `MutableArrayData` offers a more generic API for it, this PR removes that code.
   
   The overall principle for this removal is that `Builder` is the API to build an arrow array from elements or slices of rust native types, while the `MutableArrayData` (for a lack of a better name) is suited to build an arrow array from an existing set of arrow arrays. In the case of `concat`, this corresponds to mem-copies of the individual arrays (taking into account nulls and all that stuff) in sequence.
   
   Based on this principle, `Builder` does not need to know how to build an array from existing arrays (the `append_data`).
   
   I would like to migrate all the tests for the `XBuilder::append_data` to the `MutableArrayData`, to not lose them, but for that #8850 #8852 #8851 and #8849 and #8848 needs to land first (thus being a draft).
   
   Benchmarks:
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | concat str 1024 | -45.3 | 
   | concat str nulls 1024 | -61.1 | 
   | concat i32 1024 | -83.5 | 
   | concat i32 nulls 1024 | -86.1 |
   
   ```
   git checkout 66468daf0b3ac3ef08b7c99c690e7b845f23ad2b
   cargo bench --bench concatenate_kernel
   git checkout concat
   cargo bench --bench concatenate_kernel
   ```
   
   ```
   Previous HEAD position was 66468daf0 Added concatenate bench
   Switched to branch 'concat'
      Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
       Finished bench [optimized] target(s) in 58.72s
        Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/concatenate_kernel-94b8f5621cd4f767
   Gnuplot not found, using plotters backend
   concat i32 1024         time:   [4.2852 us 4.2912 us 4.2973 us]                             
                           change: [-83.690% -83.469% -83.188%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     1 (1.00%) low severe
     4 (4.00%) low mild
     3 (3.00%) high mild
     5 (5.00%) high severe
   
   concat i32 nulls 1024   time:   [4.8617 us 4.8820 us 4.9080 us]                                   
                           change: [-86.335% -86.101% -85.813%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     2 (2.00%) low mild
     4 (4.00%) high mild
     4 (4.00%) high severe
   
   concat str 1024         time:   [19.472 us 19.527 us 19.593 us]                             
                           change: [-46.212% -45.314% -44.341%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 11 outliers among 100 measurements (11.00%)
     4 (4.00%) low mild
     4 (4.00%) high mild
     3 (3.00%) high severe
   
   concat str nulls 1024   time:   [39.447 us 39.525 us 39.613 us]                                   
                           change: [-61.858% -61.091% -60.311%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 13 outliers among 100 measurements (13.00%)
     3 (3.00%) low mild
     5 (5.00%) high mild
     5 (5.00%) high severe
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#discussion_r541858977



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -3969,579 +3296,4 @@ mod tests {
         // Special error if the key overflows (256th entry)
         builder.append(1257).unwrap();
     }
-
-    #[test]
-    fn test_primitive_append() -> Result<()> {

Review comment:
       I have incorporated all tests. There were some small bugs in the tests wrt to nulls, and also a bug in the `MutableDataArray`. So, overall, everyone won there.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #8853: ARROW-10827: [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #8853:
URL: https://github.com/apache/arrow/pull/8853


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao edited a comment on pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#issuecomment-743700871


   I have now migrated all tests to the `MutableArrayData`, which demonstrated a bug. The bug is fixed as part of this PR.
   
   This is now ready to review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#discussion_r538032573



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -3969,579 +3296,4 @@ mod tests {
         // Special error if the key overflows (256th entry)
         builder.append(1257).unwrap();
     }
-
-    #[test]
-    fn test_primitive_append() -> Result<()> {

Review comment:
       is there value in porting some of these edge-case tests for mutable array?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8853: ARROW-10827: [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#discussion_r542643059



##########
File path: rust/arrow/src/array/array_binary.rs
##########
@@ -373,6 +376,45 @@ impl From<Vec<Vec<u8>>> for FixedSizeBinaryArray {
     }
 }
 
+impl From<Vec<Option<Vec<u8>>>> for FixedSizeBinaryArray {
+    fn from(data: Vec<Option<Vec<u8>>>) -> Self {
+        let len = data.len();
+        assert!(len > 0);
+        // try to estimate the size. This may not be possible no entry is valid => panic

Review comment:
       Filed https://issues.apache.org/jira/browse/ARROW-10903




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8853: ARROW-10827: [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#issuecomment-744642738


   Time to 🚢 🇮🇹 !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#issuecomment-743701408


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=h1) Report
   > Merging [#8853](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=desc) (1f3c773) into [master](https://codecov.io/gh/apache/arrow/commit/2816f37ff01cfd31101b3ee6cc8574cc9246dd1b?el=desc) (2816f37) will **decrease** coverage by `23.19%`.
   > The diff coverage is `99.19%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8853/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=tree)
   
   ```diff
   @@             Coverage Diff             @@
   ##           master    #8853       +/-   ##
   ===========================================
   - Coverage   76.99%   53.79%   -23.20%     
   ===========================================
     Files         174      170        -4     
     Lines       40392    30069    -10323     
   ===========================================
   - Hits        31099    16176    -14923     
   - Misses       9293    13893     +4600     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `83.10% <ø> (+1.36%)` | :arrow_up: |
   | [...ust/datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfYWdncmVnYXRlLnJz) | `0.00% <ø> (ø)` | |
   | [rust/datafusion/src/physical\_plan/sort.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3NvcnQucnM=) | `0.00% <ø> (ø)` | |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.51% <91.30%> (+0.04%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <100.00%> (+36.84%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL21vZC5ycw==) | `83.90% <100.00%> (+7.05%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbmNhdC5ycw==) | `100.00% <100.00%> (+18.75%)` | :arrow_up: |
   | [rust/arrow/src/ffi.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvZmZpLnJz) | `70.61% <100.00%> (ø)` | |
   | [rust/parquet/src/column/page.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9jb2x1bW4vcGFnZS5ycw==) | `0.00% <0.00%> (-98.69%)` | :arrow_down: |
   | [rust/parquet/src/record/api.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9yZWNvcmQvYXBpLnJz) | `0.00% <0.00%> (-98.15%)` | :arrow_down: |
   | ... and [60 more](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=footer). Last update [2816f37...1f3c773](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8853: ARROW-10827: [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#discussion_r542189448



##########
File path: rust/arrow/src/array/array_binary.rs
##########
@@ -373,6 +376,45 @@ impl From<Vec<Vec<u8>>> for FixedSizeBinaryArray {
     }
 }
 
+impl From<Vec<Option<Vec<u8>>>> for FixedSizeBinaryArray {
+    fn from(data: Vec<Option<Vec<u8>>>) -> Self {
+        let len = data.len();
+        assert!(len > 0);
+        // try to estimate the size. This may not be possible no entry is valid => panic

Review comment:
       Note that in general we should avoid using these because they require two allocations (rust's `Vec` and arrow buffers). This function is mostly useful for testing.
   
   I would be ok with replacing them by the `FromIter` constructor, which is more performance, more general, and has the same ergonomics (`from(vec![].into_iter())` instead of `from(vec![...])` for a vector). This way we do not need to worry about these.
   
   The challenge with fixed sized items is that they require knowledge of the size. This would be nicely solved by accepting `Option<[T; T: usize]>`, but Rust's support for constant generics is slim atm.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] houqp commented on a change in pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#discussion_r538703517



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -3969,579 +3296,4 @@ mod tests {
         // Special error if the key overflows (256th entry)
         builder.append(1257).unwrap();
     }
-
-    #[test]
-    fn test_primitive_append() -> Result<()> {

Review comment:
       Ha, my bad, my eyes couldn't resist to skip that part and went into the bechmark section :P




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #8853: ARROW-10827: [Rust] Move concat from builders to a compute kernel and make it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#discussion_r541919322



##########
File path: rust/arrow/src/array/array_binary.rs
##########
@@ -373,6 +376,45 @@ impl From<Vec<Vec<u8>>> for FixedSizeBinaryArray {
     }
 }
 
+impl From<Vec<Option<Vec<u8>>>> for FixedSizeBinaryArray {
+    fn from(data: Vec<Option<Vec<u8>>>) -> Self {
+        let len = data.len();
+        assert!(len > 0);
+        // try to estimate the size. This may not be possible no entry is valid => panic

Review comment:
       This behavior (panic'ing') doesn't seem ideal, though I realize there isn't much useful to do when converting a Vec of entirely `None` -- maybe we could just return a zero length array. 
   
   Could definitely be done as a follow on PR 

##########
File path: rust/arrow/src/array/array_binary.rs
##########
@@ -373,6 +376,45 @@ impl From<Vec<Vec<u8>>> for FixedSizeBinaryArray {
     }
 }
 
+impl From<Vec<Option<Vec<u8>>>> for FixedSizeBinaryArray {
+    fn from(data: Vec<Option<Vec<u8>>>) -> Self {
+        let len = data.len();
+        assert!(len > 0);
+        // try to estimate the size. This may not be possible no entry is valid => panic
+        let size = data.iter().filter_map(|e| e.as_ref()).next().unwrap().len();
+        assert!(data

Review comment:
       Given this operation can fail (if all the elements are not the same length) perhaps we should implement `TryFrom` instead of `From` and `panic` -- again, this would be an excellent follow on PR (or maybe file a ticket and it could be an excellent first contribution for someone who wanted to contribute)

##########
File path: rust/arrow/src/compute/kernels/concat.rs
##########
@@ -20,154 +20,60 @@
 //! Example:
 //!
 //! ```
-//! use std::sync::Arc;
 //! use arrow::array::{ArrayRef, StringArray};
 //! use arrow::compute::concat;
 //!
-//! let arr = concat(&vec![
-//!     Arc::new(StringArray::from(vec!["hello", "world"])) as ArrayRef,
-//!     Arc::new(StringArray::from(vec!["!"])) as ArrayRef,
+//! let arr = concat(&[
+//!     &StringArray::from(vec!["hello", "world"]),
+//!     &StringArray::from(vec!["!"]),
 //! ]).unwrap();
 //! assert_eq!(arr.len(), 3);
 //! ```
 
+use std::sync::Arc;
+
 use crate::array::*;
-use crate::datatypes::*;
 use crate::error::{ArrowError, Result};
 
-use TimeUnit::*;
-
-/// Concatenate multiple `ArrayRef` with the same type.
-///
-/// Returns a new ArrayRef.
-pub fn concat(array_list: &[ArrayRef]) -> Result<ArrayRef> {
-    if array_list.is_empty() {
+/// Concatenate multiple [Array] of the same type into a single [ArrayRef].
+pub fn concat(arrays: &[&Array]) -> Result<ArrayRef> {
+    if arrays.is_empty() {
         return Err(ArrowError::ComputeError(
             "concat requires input of at least one array".to_string(),
         ));
     }
-    let array_data_list = &array_list
+
+    if arrays
         .iter()
-        .map(|a| a.data_ref().clone())
-        .collect::<Vec<ArrayDataRef>>();
-
-    match array_data_list[0].data_type() {
-        DataType::Utf8 => {
-            let mut builder = StringBuilder::new(0);
-            builder.append_data(array_data_list)?;
-            Ok(ArrayBuilder::finish(&mut builder))
-        }
-        DataType::Boolean => {
-            let mut builder = BooleanArray::builder(0);
-            builder.append_data(array_data_list)?;
-            Ok(ArrayBuilder::finish(&mut builder))
-        }
-        DataType::Int8 => concat_primitive::<Int8Type>(array_data_list),
-        DataType::Int16 => concat_primitive::<Int16Type>(array_data_list),
-        DataType::Int32 => concat_primitive::<Int32Type>(array_data_list),
-        DataType::Int64 => concat_primitive::<Int64Type>(array_data_list),
-        DataType::UInt8 => concat_primitive::<UInt8Type>(array_data_list),
-        DataType::UInt16 => concat_primitive::<UInt16Type>(array_data_list),
-        DataType::UInt32 => concat_primitive::<UInt32Type>(array_data_list),
-        DataType::UInt64 => concat_primitive::<UInt64Type>(array_data_list),
-        DataType::Float32 => concat_primitive::<Float32Type>(array_data_list),
-        DataType::Float64 => concat_primitive::<Float64Type>(array_data_list),
-        DataType::Date32(_) => concat_primitive::<Date32Type>(array_data_list),
-        DataType::Date64(_) => concat_primitive::<Date64Type>(array_data_list),
-        DataType::Time32(Second) => concat_primitive::<Time32SecondType>(array_data_list),
-        DataType::Time32(Millisecond) => {
-            concat_primitive::<Time32MillisecondType>(array_data_list)
-        }
-        DataType::Time64(Microsecond) => {
-            concat_primitive::<Time64MicrosecondType>(array_data_list)
-        }
-        DataType::Time64(Nanosecond) => {
-            concat_primitive::<Time64NanosecondType>(array_data_list)
-        }
-        DataType::Timestamp(Second, _) => {
-            concat_primitive::<TimestampSecondType>(array_data_list)
-        }
-        DataType::Timestamp(Millisecond, _) => {
-            concat_primitive::<TimestampMillisecondType>(array_data_list)
-        }
-        DataType::Timestamp(Microsecond, _) => {
-            concat_primitive::<TimestampMicrosecondType>(array_data_list)
-        }
-        DataType::Timestamp(Nanosecond, _) => {
-            concat_primitive::<TimestampNanosecondType>(array_data_list)
-        }
-        DataType::Interval(IntervalUnit::YearMonth) => {
-            concat_primitive::<IntervalYearMonthType>(array_data_list)
-        }
-        DataType::Interval(IntervalUnit::DayTime) => {
-            concat_primitive::<IntervalDayTimeType>(array_data_list)
-        }
-        DataType::Duration(TimeUnit::Second) => {
-            concat_primitive::<DurationSecondType>(array_data_list)
-        }
-        DataType::Duration(TimeUnit::Millisecond) => {
-            concat_primitive::<DurationMillisecondType>(array_data_list)
-        }
-        DataType::Duration(TimeUnit::Microsecond) => {
-            concat_primitive::<DurationMicrosecondType>(array_data_list)
-        }
-        DataType::Duration(TimeUnit::Nanosecond) => {
-            concat_primitive::<DurationNanosecondType>(array_data_list)
-        }
-        DataType::List(nested_field) => {
-            concat_list(array_data_list, nested_field.data_type())
-        }
-        t => Err(ArrowError::ComputeError(format!(
-            "Concat not supported for data type {:?}",
-            t
-        ))),
+        .any(|array| array.data_type() != arrays[0].data_type())
+    {
+        return Err(ArrowError::InvalidArgumentError(
+            "It is not possible to concatenate arrays of different data types."
+                .to_string(),
+        ));
     }
-}
 
-#[inline]
-fn concat_primitive<T>(array_data_list: &[ArrayDataRef]) -> Result<ArrayRef>
-where
-    T: ArrowNumericType,
-{
-    let mut builder = PrimitiveArray::<T>::builder(0);
-    builder.append_data(array_data_list)?;
-    Ok(ArrayBuilder::finish(&mut builder))
-}
+    let lengths = arrays.iter().map(|array| array.len()).collect::<Vec<_>>();
+    let capacity = lengths.iter().sum();
 
-#[inline]
-fn concat_primitive_list<T>(array_data_list: &[ArrayDataRef]) -> Result<ArrayRef>
-where
-    T: ArrowNumericType,
-{
-    let mut builder = ListBuilder::new(PrimitiveArray::<T>::builder(0));
-    builder.append_data(array_data_list)?;
-    Ok(ArrayBuilder::finish(&mut builder))
-}
+    let arrays = arrays
+        .iter()
+        .map(|a| a.data_ref().as_ref())
+        .collect::<Vec<_>>();
 
-#[inline]
-fn concat_list(
-    array_data_list: &[ArrayDataRef],
-    data_type: &DataType,
-) -> Result<ArrayRef> {
-    match data_type {
-        DataType::Int8 => concat_primitive_list::<Int8Type>(array_data_list),
-        DataType::Int16 => concat_primitive_list::<Int16Type>(array_data_list),
-        DataType::Int32 => concat_primitive_list::<Int32Type>(array_data_list),
-        DataType::Int64 => concat_primitive_list::<Int64Type>(array_data_list),
-        DataType::UInt8 => concat_primitive_list::<UInt8Type>(array_data_list),
-        DataType::UInt16 => concat_primitive_list::<UInt16Type>(array_data_list),
-        DataType::UInt32 => concat_primitive_list::<UInt32Type>(array_data_list),
-        DataType::UInt64 => concat_primitive_list::<UInt64Type>(array_data_list),
-        t => Err(ArrowError::ComputeError(format!(
-            "Concat not supported for list with data type {:?}",
-            t
-        ))),
+    let mut mutable = MutableArrayData::new(arrays, false, capacity);
+
+    for (i, len) in lengths.iter().enumerate() {
+        mutable.extend(i, 0, *len)

Review comment:
       well, that is certainly nicer




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#issuecomment-743701408


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=h1) Report
   > Merging [#8853](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=desc) (9adc230) into [master](https://codecov.io/gh/apache/arrow/commit/2816f37ff01cfd31101b3ee6cc8574cc9246dd1b?el=desc) (2816f37) will **increase** coverage by `0.05%`.
   > The diff coverage is `99.19%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8853/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8853      +/-   ##
   ==========================================
   + Coverage   76.99%   77.04%   +0.05%     
   ==========================================
     Files         174      181       +7     
     Lines       40392    40349      -43     
   ==========================================
   - Hits        31099    31088      -11     
   + Misses       9293     9261      -32     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/builder.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYnVpbGRlci5ycw==) | `84.17% <ø> (+2.43%)` | :arrow_up: |
   | [...ust/datafusion/src/physical\_plan/hash\_aggregate.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL2hhc2hfYWdncmVnYXRlLnJz) | `0.00% <ø> (ø)` | |
   | [rust/datafusion/src/physical\_plan/sort.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9waHlzaWNhbF9wbGFuL3NvcnQucnM=) | `0.00% <ø> (ø)` | |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.51% <91.30%> (+0.04%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <100.00%> (+36.84%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL21vZC5ycw==) | `83.90% <100.00%> (+7.05%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbmNhdC5ycw==) | `100.00% <100.00%> (+18.75%)` | :arrow_up: |
   | [rust/arrow/src/ffi.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvZmZpLnJz) | `70.61% <100.00%> (ø)` | |
   | [rust/datafusion/src/optimizer/utils.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9vcHRpbWl6ZXIvdXRpbHMucnM=) | `0.00% <0.00%> (ø)` | |
   | [rust/datafusion/src/execution/context.rs](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree#diff-cnVzdC9kYXRhZnVzaW9uL3NyYy9leGVjdXRpb24vY29udGV4dC5ycw==) | `0.00% <0.00%> (ø)` | |
   | ... and [18 more](https://codecov.io/gh/apache/arrow/pull/8853/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=footer). Last update [2816f37...9adc230](https://codecov.io/gh/apache/arrow/pull/8853?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#issuecomment-743700871


   I have now migrated all tests to the `MutableArrayData`, which demonstrated a bug. The bug is fixed as part of this PR.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#issuecomment-739487074


   https://issues.apache.org/jira/browse/ARROW-10827


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8853: ARROW-10827: [Rust] Extended concat and made it faster (2-6x)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #8853:
URL: https://github.com/apache/arrow/pull/8853#discussion_r538034958



##########
File path: rust/arrow/src/array/builder.rs
##########
@@ -3969,579 +3296,4 @@ mod tests {
         // Special error if the key overflows (256th entry)
         builder.append(1257).unwrap();
     }
-
-    #[test]
-    fn test_primitive_append() -> Result<()> {

Review comment:
       Yes, there is. PR's description:
   
   > I would like to migrate all the tests for the XBuilder::append_data to the MutableArrayData, to not lose them, but for that #8850 #8852 #8851 and #8849 and #8848 needs to land first (thus being a draft).
   
   ;)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org