You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by al...@apache.org on 2021/01/22 15:20:22 UTC

[arrow] branch master updated: ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec

This is an automated email from the ASF dual-hosted git repository.

alamb pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git


The following commit(s) were added to refs/heads/master by this push:
     new 262bbdc  ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec
262bbdc is described below

commit 262bbdca6d53419457eadc8ab0c35ba441a4f9b1
Author: Heres, Daniel <da...@gmail.com>
AuthorDate: Fri Jan 22 10:19:10 2021 -0500

    ARROW-11332: [Rust] Use MutableBuffer in take_string instead of Vec
    
    This PR changes take string to use `MutableBuffer` to create a byte array directly instead of converting it from a `Vec<u8>`.
    There used to be some overhead compared to using a `Vec` and converting it to a buffer afterwards, but the overhead seems to be gone now.
    
    The change seems to be neutral according to benchmarks, giving results within a few %. If there is any remaining overhead in `MutableBufffer` I think we should fix that rather than having some workarounds and inconsistencies with other kernels.
    
    ```
    take str 512            time:   [2.3304 us 2.3358 us 2.3419 us]
                            change: [-4.4130% -4.0693% -3.7241%] (p = 0.00 < 0.05)
                            Performance has improved.
    Found 3 outliers among 100 measurements (3.00%)
      2 (2.00%) high mild
      1 (1.00%) high severe
    
    Benchmarking take str 1024: Collecting 100 samples in estimated 5.0198 s (1.1M i                                                                                take str 1024           time:   [4.3583 us 4.3633 us 4.3694 us]
                            change: [-0.5853% +1.1186% +2.9951%] (p = 0.29 > 0.05)
                            No change in performance detected.
    Found 16 outliers among 100 measurements (16.00%)
      3 (3.00%) low severe
      6 (6.00%) high mild
      7 (7.00%) high severe
    
    Benchmarking take str null indices 512: Collecting 100 samples in estimated 5.00                                                                                take str null indices 512
                            time:   [2.4779 us 2.4813 us 2.4844 us]
                            change: [-2.4765% -2.2000% -1.9437%] (p = 0.00 < 0.05)
                            Performance has improved.
    Found 5 outliers among 100 measurements (5.00%)
      3 (3.00%) low mild
      2 (2.00%) high mild
    
    Benchmarking take str null indices 1024: Collecting 100 samples in estimated 5.0                                                                                take str null indices 1024
                            time:   [4.4823 us 4.4910 us 4.5053 us]
                            change: [-4.8482% -4.5426% -4.2894%] (p = 0.00 < 0.05)
                            Performance has improved.
    Found 5 outliers among 100 measurements (5.00%)
      1 (1.00%) low mild
      3 (3.00%) high mild
      1 (1.00%) high severe
    
    Benchmarking take str null values 1024: Collecting 100 samples in estimated 5.00                                                                                take str null values 1024
                            time:   [4.4856 us 4.4889 us 4.4920 us]
                            change: [-2.2093% -2.0471% -1.8925%] (p = 0.00 < 0.05)
                            Performance has improved.
    Found 7 outliers among 100 measurements (7.00%)
      2 (2.00%) low severe
      3 (3.00%) low mild
      1 (1.00%) high mild
      1 (1.00%) high severe
    
    Benchmarking take str null values null indices 1024: Collecting 100 samples in e                                                                                take str null values null indices 1024
                            time:   [9.6438 us 9.6514 us 9.6592 us]
                            change: [-2.8600% -2.7478% -2.6338%] (p = 0.00 < 0.05)
                            Performance has improved.
    Found 2 outliers among 100 measurements (2.00%)
      1 (1.00%) high mild
      1 (1.00%) high severe
    ```
    
    Closes #9279 from Dandandan/take_string_opt
    
    Authored-by: Heres, Daniel <da...@gmail.com>
    Signed-off-by: Andrew Lamb <an...@nerdnetworks.org>
---
 rust/arrow/src/compute/kernels/take.rs | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/rust/arrow/src/compute/kernels/take.rs b/rust/arrow/src/compute/kernels/take.rs
index e60cad1..eaf3da2 100644
--- a/rust/arrow/src/compute/kernels/take.rs
+++ b/rust/arrow/src/compute/kernels/take.rs
@@ -423,7 +423,7 @@ where
     let mut offsets_buffer = MutableBuffer::from_len_zeroed(bytes_offset);
 
     let offsets = offsets_buffer.typed_data_mut();
-    let mut values = Vec::with_capacity(bytes_offset);
+    let mut values = MutableBuffer::new(0);
     let mut length_so_far = OffsetSize::zero();
     offsets[0] = length_so_far;
 
@@ -513,7 +513,7 @@ where
     let mut data = ArrayData::builder(<OffsetSize as StringOffsetSizeTrait>::DATA_TYPE)
         .len(data_len)
         .add_buffer(offsets_buffer.into())
-        .add_buffer(Buffer::from(values));
+        .add_buffer(values.into());
     if let Some(null_buffer) = nulls {
         data = data.null_bit_buffer(null_buffer);
     }