You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/17 18:57:05 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

jorgecarleitao opened a new pull request #9235:
URL: https://github.com/apache/arrow/pull/9235


   # Rational
   
   Rust forbids safely accessing uninitialized memory because it is undefined behavior. However, when building `Buffer`s, it is important to be able to _write_ to uninitialized memory regions, thereby avoiding the need to write _something_ to it before using it.
   
   Currently, all our initializations are zeroed, which is expensive. #9076 modifies our allocator to allocate uninitialized regions. However, by itself, this is not useful if we do not offer any methods to write to those (uninitialized) regions.
   
   # This PR
   
   This PR is built on top of #9076 and introduces methods to extend a `MutableBuffer` from an iterator (and an `ExactSizedIterator` when it is possible) to build a `MutableBuffer`, thereby offering a `safe` API to efficiently grow `MutableBuffer` without having to initialize memory regions with zeros (i.e. without `with_bitset` and the like).
   
   The design is heavily inspired in `Vec`, with the catch that we use stable Rust (i.e. no trait specialization), and thus have to expose a bit more methods than what `Vec` exposes. Also, unfortunately Rust does not support `collect()` for `ExactSizedIterator` and `TrustedLen` is `unstable`, which means that we can't use that (nicer) API for sized iterators based on `collect()`.
   
   The first commit is just a fix to a bench, that was taking the creation of an array into account. The second commit is the most important and contains the new APIs. The last 2 commits are examples of what this API looks like and what it can achieve (benches below).
   
   PS: using `ExactSizedIterator` is 2x faster than the `Iterator`. I have been fighting the compiler to try to have the same performance in both (as it is only a branch on the if), but the compiler is not being very friendly to me (related to https://github.com/rust-lang/rust/issues/32155).
   
   ```bash
   git checkout master
   cargo bench --bench arithmetic_kernels
   git checkout length_faster
   cargo bench --bench arithmetic_kernels
   
   git checkout 557e728b201ccb301b05e0cb1470782d37c6994c
   cargo bench --bench length_kernel
   git checkout length_faster
   ```
   
   ```
      Compiling arrow v3.0.0-SNAPSHOT (/Users/jorgecarleitao/projects/arrow/rust/arrow)
       Finished bench [optimized] target(s) in 1m 00s
        Running /Users/jorgecarleitao/projects/arrow/rust/target/release/deps/arithmetic_kernels-ec2cc20ce07d9b83
   Gnuplot not found, using plotters backend
   add 512                 time:   [509.72 ns 510.21 ns 510.69 ns]                     
                           change: [-24.729% -24.227% -23.740%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   
   subtract 512            time:   [498.20 ns 499.79 ns 501.36 ns]                          
                           change: [-25.168% -24.543% -23.948%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 4 outliers among 100 measurements (4.00%)
     2 (2.00%) high mild
     2 (2.00%) high severe
   
   multiply 512            time:   [498.28 ns 501.10 ns 504.14 ns]                          
                           change: [-32.237% -29.733% -27.551%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   divide 512              time:   [1.8751 us 1.8771 us 1.8790 us]                        
                           change: [-21.101% -20.410% -19.729%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 6 outliers among 100 measurements (6.00%)
     1 (1.00%) high mild
     5 (5.00%) high severe
   
   limit 512, 512          time:   [360.62 ns 362.85 ns 365.08 ns]                           
                           change: [-4.0917% -2.8282% -1.6589%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 1 outliers among 100 measurements (1.00%)
     1 (1.00%) high mild
   
   add_nulls_512           time:   [523.34 ns 525.34 ns 527.35 ns]                           
                           change: [-19.810% -19.242% -18.654%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 3 outliers among 100 measurements (3.00%)
     1 (1.00%) high mild
     2 (2.00%) high severe
   
   divide_nulls_512        time:   [1.8594 us 1.8606 us 1.8617 us]                              
                           change: [-21.900% -21.444% -20.974%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 8 outliers among 100 measurements (8.00%)
     2 (2.00%) high mild
     6 (6.00%) high severe
   ```
   
   Length (against the commit that fixes the bench, `16bc7200f3baa6e526aea7135c60dcc949c9b592`, not master):
   
   ```
   length                  time:   [1.5379 us 1.5408 us 1.5437 us]                    
                           change: [-97.311% -97.295% -97.278%] (p = 0.00 < 0.05)
                           Performance has improved.
   Found 12 outliers among 100 measurements (12.00%)
     1 (1.00%) low severe
     4 (4.00%) low mild
     3 (3.00%) high mild
     4 (4.00%) high severe
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r559224860



##########
File path: rust/arrow/src/compute/kernels/arithmetic.rs
##########
@@ -147,16 +143,17 @@ where
         .values()
         .iter()
         .zip(right.values().iter())
-        .map(|(l, r)| op(*l, *r))
-        .collect::<Vec<T::Native>>();
+        .map(|(l, r)| op(*l, *r));
+    let mut buffer = MutableBuffer::new(0);
+    buffer.extend_from_exact_iter(values);

Review comment:
       This is the main difference in this PR for the arithmetics: extend from the iterator instead of extending a vector and then memcopying the data.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r562855082



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +970,157 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+        let (lower, _) = iterator.size_hint();
+        let additional = lower * size;
+        self.reserve(additional);
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let mut len = SetLenOnDrop::new(&mut self.len);
+        let mut dst = unsafe { self.data.as_ptr().add(len.local_len) as *mut T };
+        let capacity = self.capacity;
+
+        while len.local_len + size <= capacity {
+            if let Some(item) = iterator.next() {
+                unsafe {
+                    std::ptr::write(dst, item);
+                    dst = dst.add(1);
+                }
+                len.local_len += size;
+            } else {
+                break;
+            }
+        }
+        drop(len);
+
+        iterator.for_each(|item| self.push(item));
+    }
+}
+
+impl Buffer {
+    /// Creates a [`Buffer`] from an [`Iterator`] with a trusted (upper) length.
+    /// Prefer this to `collect` whenever possible, as it is faster ~60% faster.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::Buffer;
+    /// let v = vec![1u32];
+    /// let iter = v.iter().map(|x| x * 2);
+    /// let buffer = unsafe { Buffer::from_trusted_len_iter(iter) };
+    /// assert_eq!(buffer.len(), 4) // u32 has 4 bytes
+    /// ```
+    /// # Safety
+    /// This method assumes that the iterator's size is correct and is undefined behavior
+    /// to use it on an iterator that reports an incorrect length.
+    // This implementation is required for two reasons:
+    // 1. there is no trait `TrustedLen` in stable rust and therefore
+    //    we can't specialize `extend` for `TrustedLen` like `Vec` does.
+    // 2. `from_trusted_len_iter` is faster.
+    pub unsafe fn from_trusted_len_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        iterator: I,
+    ) -> Self {
+        let (_, upper) = iterator.size_hint();
+        let upper = upper.expect("from_trusted_len_iter requires an upper limit");
+        let len = upper * std::mem::size_of::<T>();
+
+        let mut buffer = MutableBuffer::new(len);
+
+        let mut dst = buffer.data.as_ptr() as *mut T;
+        for item in iterator {
+            // note how there is no reserve here (compared with `extend_from_iter`)
+            std::ptr::write(dst, item);
+            dst = dst.add(1);
+        }
+        buffer.len = len;

Review comment:
       Good idea. Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report
   > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (23b99df) into [master](https://codecov.io/gh/apache/arrow/commit/08cccd68802c9ddc3ca0a5d4bad6e4ba382d74b4?el=desc) (08cccd6) will **decrease** coverage by `0.27%`.
   > The diff coverage is `88.00%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9235/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9235      +/-   ##
   ==========================================
   - Coverage   81.81%   81.54%   -0.28%     
   ==========================================
     Files         214      215       +1     
     Lines       51373    51861     +488     
   ==========================================
   + Hits        42033    42289     +256     
   - Misses       9340     9572     +232     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/equal/utils.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZXF1YWwvdXRpbHMucnM=) | `74.50% <0.00%> (-0.25%)` | :arrow_down: |
   | [rust/arrow/src/bytes.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnl0ZXMucnM=) | `53.12% <ø> (-5.21%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/aggregate.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FnZ3JlZ2F0ZS5ycw==) | `74.93% <ø> (-0.07%)` | :arrow_down: |
   | [rust/arrow/src/datatypes.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvZGF0YXR5cGVzLnJz) | `78.47% <ø> (-0.28%)` | :arrow_down: |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `71.30% <0.00%> (-0.40%)` | :arrow_down: |
   | [rust/parquet/src/arrow/record\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9yZWNvcmRfcmVhZGVyLnJz) | `94.26% <46.15%> (-2.01%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2Jvb2xlYW4ucnM=) | `76.92% <50.00%> (ø)` | |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <50.00%> (-5.27%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/list.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2xpc3QucnM=) | `86.20% <66.66%> (+2.33%)` | :arrow_up: |
   | [rust/arrow/src/array/transform/utils.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3V0aWxzLnJz) | `95.00% <75.00%> (-5.00%)` | :arrow_down: |
   | ... and [96 more](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=footer). Last update [08cccd6...23b99df](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561133856



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1008,7 +1008,7 @@ impl MutableBuffer {
         let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
 
         while let Some(item) = iterator.next() {
-            if len >= capacity {
+            if len + size >= capacity {

Review comment:
       change >= to >?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r559729105



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -758,39 +793,54 @@ impl MutableBuffer {
         }
     }
 
-    /// Ensures that this buffer has at least `capacity` slots in this buffer. This will
-    /// also ensure the new capacity will be a multiple of 64 bytes.
-    ///
-    /// Returns the new capacity for this buffer.
-    pub fn reserve(&mut self, capacity: usize) -> usize {
-        if capacity > self.capacity {
-            let new_capacity = bit_util::round_upto_multiple_of_64(capacity);
-            let new_capacity = cmp::max(new_capacity, self.capacity * 2);
-            self.data =
-                unsafe { memory::reallocate(self.data, self.capacity, new_capacity) };
-            self.capacity = new_capacity;
+    fn reserve_at_least(&mut self, capacity: usize) {
+        let new_capacity = bit_util::round_upto_multiple_of_64(capacity);
+        let new_capacity = std::cmp::max(new_capacity, self.capacity * 2);
+        self.data = unsafe { memory::reallocate(self.data, self.capacity, new_capacity) };
+        self.capacity = new_capacity;
+    }
+
+    /// Ensures that this buffer has at least `self.len + additional` bytes. This re-allocates iff
+    /// `self.len + additional > capacity`.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::{Buffer, MutableBuffer};
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.reserve(253); // allocates for the first time
+    /// (0..253u8).for_each(|i| buffer.push(i)); // no reallocation
+    /// let buffer: Buffer = buffer.into();
+    /// assert_eq!(buffer.len(), 253);
+    /// ```
+    // For performance reasons, this must be inlined so that the `if` is executed inside the caller, and not as an extra call that just
+    // exits.
+    #[inline(always)]
+    pub fn reserve(&mut self, additional: usize) {
+        let required_cap = self.len + additional;
+        if required_cap > self.capacity {
+            self.reserve_at_least(required_cap)
         }
-        self.capacity
     }
 
-    /// Resizes the buffer so that the `len` will equal to the `new_len`.
-    ///
-    /// If `new_len` is greater than `len`, the buffer's length is simply adjusted to be
-    /// the former, optionally extending the capacity. The data between `len` and
-    /// `new_len` will be zeroed out.
-    ///
-    /// If `new_len` is less than `len`, the buffer will be truncated.
-    pub fn resize(&mut self, new_len: usize) {
+    /// Resizes the buffer, either truncating its contents (with no change in capacity), or

Review comment:
       I opted for not doing so for the following reasons:
   
   * `Vec::resize` has no such argument
   * `Vec::resize` does not reallocate itself when `new_len < len`
   
   I.e. my general goal is to have `MutableBuffer`'s API as close as possible to `Vec` so that it can serve as a drop-in replacement that people can easily relate to.
   
   What could make sense would be to offer a `shrink_to_fit` method (which typically triggers a reallocation).
   I haven't seen a use case where `shrink_to_fit` was needed on a `MutableBuffer`, though. My understanding is that because Arrow buffers are immutable, people only use `MutableBuffer` to build new arrays, and, when building arrays, there is typically no use-case to shrink the buffer (i.e. in 100% of the cases in the crates so far we just grow them as needed).
   
   My understanding is that such memory pressure typically emerges in long-living mutable containers.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763878812


   I validated that with @mbrubeck 's change, we can just use `collect` and drop the `trusted_len` and `unsafe` hack with the same performance on the arithmetics 🎉 🎉🎉
   
   @mbrubeck , thank you so much for taking your time in addressing this; I have been really struggling with the compiler on this one and your implementation is exactly what I was trying to achieve in this PR: an efficient, `safe`, and expressive API for our users to build a `MutableBuffer`.
   
   When we commit to master we squash all commits. Would you be willing to take credit for this whole PR + your changes? IMO the most relevant part of this whole thing was your change, that enables `collect`.
   
   Alternatively, we could merge this PR as is, and then apply your changes on a separate one. Let me know what you prefer I will act accordingly (@alamb let me know if you see any problem here).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       > Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`?
   
   No.  Dropping before `self.len` is updated should be fine.  However, in my benchmarking I still found that using `SetLenOnDrop` provided a small performance benefit compared to just updating `self.len` after the loop.  I'm not sure why.

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       > Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`?
   
   No.  Dropping before `self.len` is updated should be fine.  However, in my benchmarking I still found that using `SetLenOnDrop` provided a small performance benefit compared to just updating `self.len` after the loop.  I'm not sure why.  Maybe the mutable borrow provides some additional hints to the optimizer.

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       > Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`?
   
   No.  Dropping before `self.len` is updated should be fine, `SetLenOnDrop` is not actually necessary for soundness or correctness (as long as the pointer and capacity are still valid).
   
   However, in my benchmarking I still found that using `SetLenOnDrop` provided a small performance benefit compared to just updating `self.len` after the loop.  I'm not sure why.  Maybe the mutable borrow provides some additional hints to the optimizer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report
   > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (d2c99cd) into [master](https://codecov.io/gh/apache/arrow/commit/c413566b34bd0c13a01a68148bd78df1bdec3c10?el=desc) (c413566) will **decrease** coverage by `0.00%`.
   > The diff coverage is `84.48%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9235/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9235      +/-   ##
   ==========================================
   - Coverage   81.61%   81.61%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       52508    52589      +81     
   ==========================================
   + Hits        42854    42918      +64     
   - Misses       9654     9671      +17     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/bytes.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnl0ZXMucnM=) | `53.12% <ø> (ø)` | |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <50.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.43% <81.17%> (-2.77%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.55% <95.83%> (-0.28%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/primitive.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3ByaW1pdGl2ZS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/length.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xlbmd0aC5ycw==) | `100.00% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=footer). Last update [c413566...38b131d](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763898758


   I made a mistake in benchmarking and have to retract my statement. I am really sorry for the noise. :(
   
   I mistakenly updated the wrong part of the code before running the bench.
   
   @mbrubeck , even though there is a small difference between the the two extends in the benches that you added, when used in a non-trivial operation (`arithmetic`), `collect` is about +65% slower than using `extend_from_trusted_len_iter`.
   
   This hints that the `extend_trusted_len_iter` does have a significant benefit, if anything at helping the compiler optimize out some code.
   
   what I am using for the final benches:
   
   ```bash
   git checkout 7bb2f3c13f0e652b01e3fecbee0ae32ff094f3e6
   cargo bench --bench arithmetic_kernels -- "add 512"
   git checkout cdbbc6f9d81d37216dc158177405529988fa0279
   cargo bench --bench arithmetic_kernels -- "add 512"
   ```
   
   The difference in code is 
   
   ```rust
       let values = left
           .values()
           .iter()
           .zip(right.values().iter())
           .map(|(l, r)| op(*l, *r));
       let mut buffer = MutableBuffer::new(0);
       unsafe { buffer.extend_from_trusted_len_iter(values) };
   ```
   
   to 
   
   ```rust
       let buffer = left
           .values()
           .iter()
           .zip(right.values().iter())
           .map(|(l, r)| op(*l, *r))
           .collect();
   ```
   
   IMO this is interesting by itself.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561038192



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,130 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len >= capacity {

Review comment:
       @jorgecarleitao Line 1011, the test condition may cause overflow, perhaps should increase `len` by `size` before the comparison. The following test shows this problem:
   
   ```rust
       #[test]
       fn mutable_extend_from_iter_write_overflow() {
           let mut buf = MutableBuffer::new(0);
   
           buf.extend(vec![1_u8; 63]);
           assert_eq!(63, buf.len());
           assert_eq!(64, buf.capacity());
   
           buf.extend(vec![1_u16; 1]);
           assert_eq!(65, buf.len());
           assert_eq!(64, buf.capacity());
       }
   ```
   
    I ran it my computer for several times, all passed, but you can see the `len` is bigger than `capacity`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       > Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`?
   
   No.  Dropping before `self.len` is updated should be fine, `SetLenOnDrop` is not actually necessary for soundness or correctness (as long as the pointer and capacity are still valid).
   
   However, in my benchmarking I still found that using `SetLenOnDrop` provided a small performance benefit compared to just updating `self.len` after the loop.  I'm not sure why.  Maybe the mutable borrow provides some additional hints to the optimizer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763898758


   I made a mistake in benchmarking and have to retract my statement. I am really sorry for the noise. :(
   
   I mistakenly updated the wrong part of the code before running the bench.
   
   @mbrubeck , even though there is a small difference between the the two extends in the benches that you added, when used in a non-trivial operation (`arithmetic`), `collect` is about +65% slower than using `extend_from_trusted_len_iter`.
   
   This hints that the `extend_trusted_len_iter` does have a significant benefit, if anything at helping the compiler optimize out some code.
   
   what I am using for the final benches:
   
   ```bash
   git checkout 7bb2f3c13f0e652b01e3fecbee0ae32ff094f3e6
   cargo bench --bench arithmetic_kernels -- "add 512"
   git checkout cdbbc6f9d81d37216dc158177405529988fa0279
   cargo bench --bench arithmetic_kernels -- "add 512"
   ```
   
   The difference in code is 
   
   ```rust
       let values = left
           .values()
           .iter()
           .zip(right.values().iter())
           .map(|(l, r)| op(*l, *r));
       let mut buffer = MutableBuffer::new(0);
       unsafe { buffer.extend_from_trusted_len_iter(values) };
   ```
   
   to 
   
   ```rust
       let buffer = left
           .values()
           .iter()
           .zip(right.values().iter())
           .map(|(l, r)| op(*l, *r))
           .collect();
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763880774


   > When we commit to master we squash all commits. Would you be willing to take credit for this whole PR + your changes?
   
   I think it would be more appropriate to keep your name as the author, because you are more likely to be able to answer future questions about it, etc.  But feel free to credit me in the commit message, if you like.  Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report
   > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (2b5cc6a) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **decrease** coverage by `0.00%`.
   > The diff coverage is `82.41%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9235/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9235      +/-   ##
   ==========================================
   - Coverage   81.64%   81.64%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       52489    52561      +72     
   ==========================================
   + Hits        42857    42914      +57     
   - Misses       9632     9647      +15     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <50.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.50% <80.00%> (-2.69%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/primitive.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3ByaW1pdGl2ZS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.95% <100.00%> (+0.12%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/length.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xlbmd0aC5ycw==) | `100.00% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=footer). Last update [a0e1244...f5413b1](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report
   > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (38b131d) into [master](https://codecov.io/gh/apache/arrow/commit/c413566b34bd0c13a01a68148bd78df1bdec3c10?el=desc) (c413566) will **decrease** coverage by `0.00%`.
   > The diff coverage is `84.48%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9235/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9235      +/-   ##
   ==========================================
   - Coverage   81.61%   81.61%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       52508    52589      +81     
   ==========================================
   + Hits        42854    42918      +64     
   - Misses       9654     9671      +17     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/bytes.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnl0ZXMucnM=) | `53.12% <ø> (ø)` | |
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <50.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.43% <81.17%> (-2.77%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.55% <95.83%> (-0.28%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/primitive.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3ByaW1pdGl2ZS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/length.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xlbmd0aC5ycw==) | `100.00% <100.00%> (ø)` | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=footer). Last update [c413566...38b131d](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561163855



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       Thanks a lot. I do agree that this is unsound atm.
   
   Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`? I understood that the `len` was only needed because we could have to call drop on the elements up to `len`.
   
   With `SetLenOnDrop`, we borrow a mutable reference to `self.len`, which wont allow us to call `self.reserve` inside the `if`. Could this be the reason why `SetLenOnDrop` is not being used on the `extend_desugared` (which in my understanding this part of the code is reproducing)?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763898758






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report
   > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (98b4a9c) into [master](https://codecov.io/gh/apache/arrow/commit/a7633c7567947ad945011711bfb312497c0f1f94?el=desc) (a7633c7) will **decrease** coverage by `0.00%`.
   > The diff coverage is `81.11%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9235/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9235      +/-   ##
   ==========================================
   - Coverage   81.62%   81.62%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       52522    52591      +69     
   ==========================================
   + Hits        42872    42925      +53     
   - Misses       9650     9666      +16     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <50.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `92.62% <78.37%> (-2.81%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/primitive.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3ByaW1pdGl2ZS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.55% <100.00%> (+0.12%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/length.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xlbmd0aC5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/converter.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9jb252ZXJ0ZXIucnM=) | `73.04% <0.00%> (ø)` | |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `71.84% <0.00%> (+0.13%)` | :arrow_up: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.73% <0.00%> (+0.15%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=footer). Last update [a7633c7...d175722](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r559224772



##########
File path: rust/arrow/src/compute/kernels/length.rs
##########
@@ -30,16 +29,16 @@ where
     OffsetSize: OffsetSizeTrait,
 {
     // note: offsets are stored as u8, but they can be interpreted as OffsetSize
-    let offsets = array.data_ref().clone().buffers()[0].clone();
+    let offsets = &array.data_ref().buffers()[0];
     // this is a 30% improvement over iterating over u8s and building OffsetSize, which
     // justifies the usage of `unsafe`.
     let slice: &[OffsetSize] =
         &unsafe { offsets.typed_data::<OffsetSize>() }[array.offset()..];
 
-    let lengths: Vec<OffsetSize> = slice
-        .windows(2)
-        .map(|offset| offset[1] - offset[0])
-        .collect();
+    let lengths = slice.windows(2).map(|offset| offset[1] - offset[0]);
+
+    let mut buffer = MutableBuffer::new(0);
+    buffer.extend_from_exact_iter(lengths);
 

Review comment:
       This is the main difference in this PR for the length: we extend from the iterator instead of extending a vector and then memcopying the data.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561163855



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       Thanks a lot. I do agree that this is unsound atm.
   
   Note that arrow does not support complex structs on its buffers, which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`? I understood that the `len` was only needed because we could have to call drop on the elements up to `len`.
   
   With `SetLenOnDrop`, we borrow a mutable reference to `self.len`, which wont allow us to call `self.reserve` inside the `if`. Could this be the reason why `SetLenOnDrop` is not being used on the `extend_desugared` (which in my understanding this part of the code is reproducing)?
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561038192



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,130 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len >= capacity {

Review comment:
       @jorgecarleitao Line 1011, the test condition may cause overflow, perhaps should increase `len` by `size` before the comparison. The following test shows this problem:
   
   ```rust
       #[test]
       fn mutable_extend_from_iter_write_overflow() {
           let mut buf = MutableBuffer::new(0);
   
           buf.extend(vec![1_u8; 63]);
           assert_eq!(63, buf.len());
           assert_eq!(64, buf.capacity());
   
           buf.extend(vec![1_u16; 1]);
           assert_eq!(65, buf.len());
           assert_eq!(64, buf.capacity());
       }
   ```
   
    I had run this test for several times, all passed, but you can see the `len` is bigger than `capacity`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mqy commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763763668


   > Looks like there is an error in the SIMD tests https://github.com/apache/arrow/pull/9235/checks?
   
   `cargo test --doc --package arrow -- buffer::MutableBuffer::extend_from_trusted_len_iter --nocapture`  reproduces this error.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763846523


   I was able to speed up the fast path of `extend_from_iter` by a few percent, by splitting the loop into two parts.  This version is only about 8% slower than `extend_from_trusted_len_iter`.  (Previously it was about 12% slower.)  Both are still much slower than `extend_from_slice`.
   
   <details>
   
   ```rust
   #[inline]
   fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
       &mut self,
       mut iterator: I,
   ) {
       let size = std::mem::size_of::<T>();
       let (lower, _) = iterator.size_hint();
       let additional = lower * size;
       self.reserve(additional);
   
       // this is necessary because of https://github.com/rust-lang/rust/issues/32155
       let mut len = SetLenOnDrop::new(&mut self.len);
       let mut dst = unsafe { self.data.as_ptr().add(len.local_len) as *mut T };
       let capacity = self.capacity;
   
       while len.local_len + size <= capacity {
           if let Some(item) = iterator.next() {
               unsafe {
                   std::ptr::write(dst, item);
                   dst = dst.add(1);
               }
               len.local_len += size;
           } else {
               break;
           }
       }
       drop(len);
   
       iterator.for_each(|item| self.push(item));
   }
   ```
   </details>
   
   I haven't benchmarked the slow path (the second loop), which I implemented using safe code.  It's possible that it could be made faster with unsafe code, but I don't expect large gains there.
   
   My complete code including benchmarks is on this branch: https://github.com/apache/arrow/compare/master...mbrubeck:length_faster


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       > Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`?
   
   No.  Dropping before `self.len` is updated should be fine.  However, in my benchmarking I still found that using `SetLenOnDrop` provided a small performance benefit compared to just updating `self.len` after the loop.  I'm not sure why.  Maybe the mutable borrow provides some additional hints to the optimizer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report
   > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (cb02e45) into [master](https://codecov.io/gh/apache/arrow/commit/a7633c7567947ad945011711bfb312497c0f1f94?el=desc) (a7633c7) will **decrease** coverage by `0.00%`.
   > The diff coverage is `81.11%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9235/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9235      +/-   ##
   ==========================================
   - Coverage   81.62%   81.62%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       52522    52591      +69     
   ==========================================
   + Hits        42872    42925      +53     
   - Misses       9650     9666      +16     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <50.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `92.62% <78.37%> (-2.81%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/primitive.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3ByaW1pdGl2ZS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.55% <100.00%> (+0.12%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/length.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xlbmd0aC5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `94.86% <0.00%> (-0.20%)` | :arrow_down: |
   | [rust/parquet/src/arrow/converter.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9jb252ZXJ0ZXIucnM=) | `73.04% <0.00%> (ø)` | |
   | [rust/parquet/src/arrow/array\_reader.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJheV9yZWFkZXIucnM=) | `71.84% <0.00%> (+0.13%)` | :arrow_up: |
   | [rust/parquet/src/arrow/arrow\_writer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9hcnJvd193cml0ZXIucnM=) | `95.73% <0.00%> (+0.15%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=footer). Last update [a7633c7...98b4a9c](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9235:
URL: https://github.com/apache/arrow/pull/9235


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763898758


   I made a mistake in benchmarking and have to retract my statement wrt to the `collect`. I am really sorry for the noise:
   
   @mbrubeck , even though there is a small difference between the the two extends in the individual benches, when used in a non-trivial operation (`arithmetic`), `collect` is about +65% slower than using `extend_from_trusted_len_iter`.
   
   This hints that the `extend_trusted_len_iter` does have a significant benefit, if anything at helping the compiler optimize out some code.
   
   what I am using for the final benches:
   
   ```bash
   git checkout 7bb2f3c13f0e652b01e3fecbee0ae32ff094f3e6
   cargo bench --bench arithmetic_kernels -- "add 512"
   git checkout cdbbc6f9d81d37216dc158177405529988fa0279
   cargo bench --bench arithmetic_kernels -- "add 512"
   ```
   
   The difference in code is 
   
   ```rust
       let values = left
           .values()
           .iter()
           .zip(right.values().iter())
           .map(|(l, r)| op(*l, *r));
       let mut buffer = MutableBuffer::new(0);
       unsafe { buffer.extend_from_trusted_len_iter(values) };
   ```
   
   to 
   
   ```rust
       let buffer = left
           .values()
           .iter()
           .zip(right.values().iter())
           .map(|(l, r)| op(*l, *r))
           .collect();
   ```
   
   IMO this is interesting by itself.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561163855



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       Thanks a lot. I do agree that this is unsound atm.
   
   Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`? I understood that the `len` was only needed because we could have to call drop on the elements up to `len`.
   
   With `SetLenOnDrop`, we borrow a mutable reference to `self.len`, which wont allow us to call `self.reserve` inside the `if`. Could this be the reason why `SetLenOnDrop` is not being used on the `extend_desugared` (which in my understanding this part of the code is reproducing)?
   
   (I am working on a fix)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561038192



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,130 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len >= capacity {

Review comment:
       @jorgecarleitao Line 1011, the test condition may cause overflow, perhaps should increase `len` by `size` before the comparison. The following test verified this problem:
   
       #[test]
       fn mutable_extend_from_iter_write_overflow() {
           let mut buf = MutableBuffer::new(0);
   
           buf.extend(vec![1_u8; 63]);
           assert_eq!(63, buf.len());
           assert_eq!(64, buf.capacity());
   
           buf.extend(vec![1_u16; 1]);
           assert_eq!(65, buf.len());
           assert_eq!(64, buf.capacity());
       }




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] codecov-io edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761864117


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=h1) Report
   > Merging [#9235](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=desc) (f5413b1) into [master](https://codecov.io/gh/apache/arrow/commit/a0e12445cc8689befe40c20e2fac6e6df252bef6?el=desc) (a0e1244) will **decrease** coverage by `0.00%`.
   > The diff coverage is `82.41%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9235/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #9235      +/-   ##
   ==========================================
   - Coverage   81.64%   81.64%   -0.01%     
   ==========================================
     Files         215      215              
     Lines       52489    52561      +72     
   ==========================================
   + Hits        42857    42915      +58     
   - Misses       9632     9646      +14     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/transform/fixed\_binary.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2ZpeGVkX2JpbmFyeS5ycw==) | `78.94% <50.00%> (ø)` | |
   | [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.50% <80.00%> (-2.69%)` | :arrow_down: |
   | [rust/arrow/src/array/transform/primitive.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL3ByaW1pdGl2ZS5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/arrow/src/compute/kernels/arithmetic.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2FyaXRobWV0aWMucnM=) | `89.95% <100.00%> (+0.12%)` | :arrow_up: |
   | [rust/arrow/src/compute/kernels/length.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2xlbmd0aC5ycw==) | `100.00% <100.00%> (ø)` | |
   | [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9235/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.43% <0.00%> (+0.19%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=footer). Last update [a0e1244...f5413b1](https://codecov.io/gh/apache/arrow/pull/9235?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-764944221


   I updated the description and this is now ready for a review.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-761861730


   https://issues.apache.org/jira/browse/ARROW-11291


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763314319


   I am drafting this because there is room for improvement in the API and soundness, details here https://users.rust-lang.org/t/collect-for-exactsizediterator/54367/15


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561239323



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       > Note that arrow does not support complex structs on its buffers (i.e. we only support `u8-u64, i8-i64, f32 and f64`), which means that we never need to call `drop` on the elements. Under this, do we still need a valid `len`?
   
   No.  Dropping before `self.len` is updated should be fine.  However, in my benchmarking I still found that using `SetLenOnDrop` provided a small performance benefit compared to just updating `self.len` after the loop.  I'm not sure why.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763846523


   I was able to speed up the fast path of `extend_from_iter` by a few percent, by splitting the loop into two parts.  This version is only about 8% slower than `extend_from_trusted_len_iter`.  (Previously it was about 12% slower.)  Both are still much slower than `extend_from_slice`.
   
   <details>
   
   ```rust
   #[inline]
   fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
       &mut self,
       mut iterator: I,
   ) {
       let size = std::mem::size_of::<T>();
       let (lower, _) = iterator.size_hint();
       let additional = lower * size;
       self.reserve(additional);
   
       // this is necessary because of https://github.com/rust-lang/rust/issues/32155
       let mut len = SetLenOnDrop::new(&mut self.len);
       let mut dst = unsafe { self.data.as_ptr().add(len.local_len) as *mut T };
       let capacity = self.capacity;
   
       while len.local_len + size <= capacity {
           if let Some(item) = iterator.next() {
               unsafe {
                   std::ptr::write(dst, item);
                   dst = dst.add(1);
               }
               len.local_len += size;
           } else {
               break;
           }
       }
       drop(len);
   
       iterator.for_each(|item| self.push(item));
   }
   ```
   </details>
   
   I haven't benchmarked the slow path (the second loop), which I implemented using safe code.  It's possible that it could be made faster with unsafe code, but I don't expect large gains there.
   
   My complete code including benchmarks is on this branch: https://github.com/apache/arrow/compare/master...mbrubeck:length_faster


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r561128399



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +968,131 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let (mut ptr, mut capacity, mut len) = (self.data, self.capacity, self.len);
+        let mut dst = unsafe { ptr.as_ptr().add(len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if len + size >= capacity {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                let (new_ptr, new_capacity) =
+                    unsafe { reallocate(ptr, capacity, len + additional) };

Review comment:
       This is unsound if unwinding is enabled, because `Iterator::next` may panic on a future iteration of the loop, causing `self` to be dropped while its `ptr` and `capacity` are incorrect.  This is why a drop guard like `SetLenOnDrop` is needed.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763651758


   Looks like there is an error in the SIMD tests https://github.com/apache/arrow/pull/9235/checks?check_run_id=1733044845
   
   ```
   
   failures:
   
   ---- src/buffer.rs - buffer::MutableBuffer::extend_from_trusted_len_iter (line 1036) stdout ----
   error[E0716]: temporary value dropped while borrowed
    --> src/buffer.rs:1038:12
     |
   5 | let iter = vec![1u32].iter().map(|x| x * 2);
     |            ^^^^^^^^^^                      - temporary value is freed at the end of this statement
     |            |
     |            creates a temporary which is freed while still in use
   6 | let mut buffer = MutableBuffer::new(0);
   7 | unsafe { buffer.extend_from_trusted_len_iter(iter) };
     |                                              ---- borrow later used here
     |
     = note: consider using a `let` binding to create a longer lived value
     = note: this error originates in a macro (in Nightly builds, run with -Z macro-backtrace for more info)
   
   error: aborting due to previous error
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-20% for arithmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r562596060



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1003,6 +1156,28 @@ impl PartialEq for MutableBuffer {
 unsafe impl Sync for MutableBuffer {}
 unsafe impl Send for MutableBuffer {}
 
+struct SetLenOnDrop<'a> {

Review comment:
       this construct is fascinating, but I will take your word that it was necessary for speedup

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +970,157 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.

Review comment:
       ```suggestion
   /// Ensures `ptr` points at at least `new_capacity` bytes, employing
   /// a doubling strategy if reallocate is needed.
   ///
   /// # Safety
   /// `ptr` must be allocated for `old_capacity`.
   ```

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +970,157 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+        let (lower, _) = iterator.size_hint();
+        let additional = lower * size;
+        self.reserve(additional);
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let mut len = SetLenOnDrop::new(&mut self.len);
+        let mut dst = unsafe { self.data.as_ptr().add(len.local_len) as *mut T };
+        let capacity = self.capacity;
+
+        while len.local_len + size <= capacity {
+            if let Some(item) = iterator.next() {
+                unsafe {
+                    std::ptr::write(dst, item);
+                    dst = dst.add(1);
+                }
+                len.local_len += size;
+            } else {
+                break;
+            }
+        }
+        drop(len);
+
+        iterator.for_each(|item| self.push(item));
+    }
+}
+
+impl Buffer {
+    /// Creates a [`Buffer`] from an [`Iterator`] with a trusted (upper) length.
+    /// Prefer this to `collect` whenever possible, as it is faster ~60% faster.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::Buffer;
+    /// let v = vec![1u32];
+    /// let iter = v.iter().map(|x| x * 2);
+    /// let buffer = unsafe { Buffer::from_trusted_len_iter(iter) };
+    /// assert_eq!(buffer.len(), 4) // u32 has 4 bytes
+    /// ```
+    /// # Safety
+    /// This method assumes that the iterator's size is correct and is undefined behavior
+    /// to use it on an iterator that reports an incorrect length.
+    // This implementation is required for two reasons:
+    // 1. there is no trait `TrustedLen` in stable rust and therefore
+    //    we can't specialize `extend` for `TrustedLen` like `Vec` does.
+    // 2. `from_trusted_len_iter` is faster.
+    pub unsafe fn from_trusted_len_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        iterator: I,
+    ) -> Self {
+        let (_, upper) = iterator.size_hint();
+        let upper = upper.expect("from_trusted_len_iter requires an upper limit");
+        let len = upper * std::mem::size_of::<T>();
+
+        let mut buffer = MutableBuffer::new(len);
+
+        let mut dst = buffer.data.as_ptr() as *mut T;
+        for item in iterator {
+            // note how there is no reserve here (compared with `extend_from_iter`)
+            std::ptr::write(dst, item);
+            dst = dst.add(1);
+        }
+        buffer.len = len;

Review comment:
       ```suggestion
           assert_eq!(dst.offset_from(buffer.data.as_ptr() as *mut T) as usize, upper,
                   "Trusted iterator length was not accurately reported");
           buffer.len = len;
   ```

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +970,157 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+        let (lower, _) = iterator.size_hint();
+        let additional = lower * size;
+        self.reserve(additional);
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let mut len = SetLenOnDrop::new(&mut self.len);
+        let mut dst = unsafe { self.data.as_ptr().add(len.local_len) as *mut T };
+        let capacity = self.capacity;
+
+        while len.local_len + size <= capacity {
+            if let Some(item) = iterator.next() {
+                unsafe {
+                    std::ptr::write(dst, item);
+                    dst = dst.add(1);
+                }
+                len.local_len += size;
+            } else {
+                break;
+            }
+        }
+        drop(len);
+
+        iterator.for_each(|item| self.push(item));
+    }
+}
+
+impl Buffer {
+    /// Creates a [`Buffer`] from an [`Iterator`] with a trusted (upper) length.
+    /// Prefer this to `collect` whenever possible, as it is faster ~60% faster.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::Buffer;
+    /// let v = vec![1u32];
+    /// let iter = v.iter().map(|x| x * 2);
+    /// let buffer = unsafe { Buffer::from_trusted_len_iter(iter) };
+    /// assert_eq!(buffer.len(), 4) // u32 has 4 bytes
+    /// ```
+    /// # Safety
+    /// This method assumes that the iterator's size is correct and is undefined behavior
+    /// to use it on an iterator that reports an incorrect length.
+    // This implementation is required for two reasons:
+    // 1. there is no trait `TrustedLen` in stable rust and therefore
+    //    we can't specialize `extend` for `TrustedLen` like `Vec` does.
+    // 2. `from_trusted_len_iter` is faster.
+    pub unsafe fn from_trusted_len_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        iterator: I,
+    ) -> Self {
+        let (_, upper) = iterator.size_hint();
+        let upper = upper.expect("from_trusted_len_iter requires an upper limit");
+        let len = upper * std::mem::size_of::<T>();
+
+        let mut buffer = MutableBuffer::new(len);
+
+        let mut dst = buffer.data.as_ptr() as *mut T;
+        for item in iterator {
+            // note how there is no reserve here (compared with `extend_from_iter`)
+            std::ptr::write(dst, item);
+            dst = dst.add(1);
+        }
+        buffer.len = len;

Review comment:
       The same comment for `try_from_trusted_len_iter`

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +970,157 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+        let (lower, _) = iterator.size_hint();
+        let additional = lower * size;
+        self.reserve(additional);
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let mut len = SetLenOnDrop::new(&mut self.len);
+        let mut dst = unsafe { self.data.as_ptr().add(len.local_len) as *mut T };
+        let capacity = self.capacity;
+
+        while len.local_len + size <= capacity {
+            if let Some(item) = iterator.next() {
+                unsafe {
+                    std::ptr::write(dst, item);
+                    dst = dst.add(1);
+                }
+                len.local_len += size;
+            } else {
+                break;
+            }
+        }
+        drop(len);
+
+        iterator.for_each(|item| self.push(item));

Review comment:
       ```suggestion
           // Handle the case where size_hint was lower than actual number of items
           iterator.for_each(|item| self.push(item));
   ```

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -963,11 +970,157 @@ impl MutableBuffer {
 
     /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
     #[inline]
-    pub fn extend(&mut self, additional: usize) {
+    pub fn extend_zeros(&mut self, additional: usize) {
         self.resize(self.len + additional, 0);
     }
 }
 
+/// # Safety
+/// `ptr` must be allocated for `old_capacity`.
+#[inline]
+unsafe fn reallocate(
+    ptr: NonNull<u8>,
+    old_capacity: usize,
+    new_capacity: usize,
+) -> (NonNull<u8>, usize) {
+    let new_capacity = bit_util::round_upto_multiple_of_64(new_capacity);
+    let new_capacity = std::cmp::max(new_capacity, old_capacity * 2);
+    let ptr = memory::reallocate(ptr, old_capacity, new_capacity);
+    (ptr, new_capacity)
+}
+
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+impl MutableBuffer {
+    #[inline]
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+        let (lower, _) = iterator.size_hint();
+        let additional = lower * size;
+        self.reserve(additional);
+
+        // this is necessary because of https://github.com/rust-lang/rust/issues/32155
+        let mut len = SetLenOnDrop::new(&mut self.len);
+        let mut dst = unsafe { self.data.as_ptr().add(len.local_len) as *mut T };
+        let capacity = self.capacity;
+
+        while len.local_len + size <= capacity {
+            if let Some(item) = iterator.next() {
+                unsafe {
+                    std::ptr::write(dst, item);
+                    dst = dst.add(1);
+                }
+                len.local_len += size;
+            } else {
+                break;
+            }
+        }
+        drop(len);
+
+        iterator.for_each(|item| self.push(item));
+    }
+}
+
+impl Buffer {
+    /// Creates a [`Buffer`] from an [`Iterator`] with a trusted (upper) length.
+    /// Prefer this to `collect` whenever possible, as it is faster ~60% faster.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::Buffer;
+    /// let v = vec![1u32];
+    /// let iter = v.iter().map(|x| x * 2);
+    /// let buffer = unsafe { Buffer::from_trusted_len_iter(iter) };
+    /// assert_eq!(buffer.len(), 4) // u32 has 4 bytes
+    /// ```
+    /// # Safety
+    /// This method assumes that the iterator's size is correct and is undefined behavior
+    /// to use it on an iterator that reports an incorrect length.
+    // This implementation is required for two reasons:
+    // 1. there is no trait `TrustedLen` in stable rust and therefore
+    //    we can't specialize `extend` for `TrustedLen` like `Vec` does.
+    // 2. `from_trusted_len_iter` is faster.
+    pub unsafe fn from_trusted_len_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        iterator: I,
+    ) -> Self {
+        let (_, upper) = iterator.size_hint();
+        let upper = upper.expect("from_trusted_len_iter requires an upper limit");
+        let len = upper * std::mem::size_of::<T>();
+
+        let mut buffer = MutableBuffer::new(len);
+
+        let mut dst = buffer.data.as_ptr() as *mut T;
+        for item in iterator {
+            // note how there is no reserve here (compared with `extend_from_iter`)
+            std::ptr::write(dst, item);
+            dst = dst.add(1);
+        }
+        buffer.len = len;

Review comment:
       I think it would be valuable to assert here that the number of bytes written was *actually* equal to `upper` and thus panic if the invariant wasn't held (of course after memory had been overwritten and undefined behavior had begin). It might make bugs easier to catch.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mqy commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mqy commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r559460523



##########
File path: rust/arrow/src/array/array_primitive.rs
##########
@@ -295,20 +295,21 @@ impl<T: ArrowPrimitiveType, Ptr: Borrow<Option<<T as ArrowPrimitiveType>::Native
         let data_len = data_len.expect("Iterator must be sized"); // panic if no upper bound.
 
         let num_bytes = bit_util::ceil(data_len, 8);
-        let mut null_buf = MutableBuffer::new(num_bytes).with_bitset(num_bytes, false);
+        let mut null_buf = MutableBuffer::from_len_zeroed(num_bytes);
         let mut val_buf = MutableBuffer::new(
             data_len * mem::size_of::<<T as ArrowPrimitiveType>::Native>(),
         );
 
-        let null = vec![0; mem::size_of::<<T as ArrowPrimitiveType>::Native>()];
-
         let null_slice = null_buf.as_slice_mut();
         iter.enumerate().for_each(|(i, item)| {
             if let Some(a) = item.borrow() {
                 bit_util::set_bit(null_slice, i);
-                val_buf.extend_from_slice(a.to_byte_slice());
+                val_buf.push(*a);
             } else {
-                val_buf.extend_from_slice(&null);
+                // this ensures that null items on the buffer are not arbitrary.
+                // This is important because falible operations can use null values (e.g. a vectorized "add")

Review comment:
       falible -> fallible

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -758,39 +793,54 @@ impl MutableBuffer {
         }
     }
 
-    /// Ensures that this buffer has at least `capacity` slots in this buffer. This will
-    /// also ensure the new capacity will be a multiple of 64 bytes.
-    ///
-    /// Returns the new capacity for this buffer.
-    pub fn reserve(&mut self, capacity: usize) -> usize {
-        if capacity > self.capacity {
-            let new_capacity = bit_util::round_upto_multiple_of_64(capacity);
-            let new_capacity = cmp::max(new_capacity, self.capacity * 2);
-            self.data =
-                unsafe { memory::reallocate(self.data, self.capacity, new_capacity) };
-            self.capacity = new_capacity;
+    fn reserve_at_least(&mut self, capacity: usize) {
+        let new_capacity = bit_util::round_upto_multiple_of_64(capacity);
+        let new_capacity = std::cmp::max(new_capacity, self.capacity * 2);
+        self.data = unsafe { memory::reallocate(self.data, self.capacity, new_capacity) };
+        self.capacity = new_capacity;
+    }
+
+    /// Ensures that this buffer has at least `self.len + additional` bytes. This re-allocates iff
+    /// `self.len + additional > capacity`.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::{Buffer, MutableBuffer};
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.reserve(253); // allocates for the first time
+    /// (0..253u8).for_each(|i| buffer.push(i)); // no reallocation
+    /// let buffer: Buffer = buffer.into();
+    /// assert_eq!(buffer.len(), 253);
+    /// ```
+    // For performance reasons, this must be inlined so that the `if` is executed inside the caller, and not as an extra call that just
+    // exits.
+    #[inline(always)]
+    pub fn reserve(&mut self, additional: usize) {
+        let required_cap = self.len + additional;
+        if required_cap > self.capacity {
+            self.reserve_at_least(required_cap)
         }
-        self.capacity
     }
 
-    /// Resizes the buffer so that the `len` will equal to the `new_len`.
-    ///
-    /// If `new_len` is greater than `len`, the buffer's length is simply adjusted to be
-    /// the former, optionally extending the capacity. The data between `len` and
-    /// `new_len` will be zeroed out.
-    ///
-    /// If `new_len` is less than `len`, the buffer will be truncated.
-    pub fn resize(&mut self, new_len: usize) {
+    /// Resizes the buffer, either truncating its contents (with no change in capacity), or

Review comment:
       In order to reduce the memory pressure in some scenarios, perhaps it's sound to make the `grow-only(no shrink cap) behavior` configurable with an additional boolean argument (default true).

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -861,49 +913,147 @@ impl MutableBuffer {
         }
     }
 
-    /// View buffer as typed slice.
+    /// View this buffer asa slice of a specific type.
+    /// # Safety
+    /// This function must only be used when this buffer was extended with items of type `T`.
+    /// Failure to do so results in undefined behavior.
     pub fn typed_data_mut<T: ArrowNativeType>(&mut self) -> &mut [T] {
-        assert_eq!(self.len() % mem::size_of::<T>(), 0);
-        assert!(memory::is_ptr_aligned::<T>(self.data.cast()));
-        // JUSTIFICATION
-        //  Benefit
-        //      Many of the buffers represent specific types, and consumers of `Buffer` often need to re-interpret them.
-        //  Soundness
-        //      * The pointer is non-null by construction
-        //      * alignment asserted above
         unsafe {
-            std::slice::from_raw_parts_mut(
-                self.as_ptr() as *mut T,
-                self.len() / mem::size_of::<T>(),
-            )
+            let (prefix, offsets, suffix) = self.as_slice_mut().align_to_mut::<T>();
+            assert!(prefix.is_empty() && suffix.is_empty());
+            offsets
         }
     }
 
-    /// Extends the buffer from a byte slice, incrementing its capacity if needed.
-    #[inline]
-    pub fn extend_from_slice(&mut self, bytes: &[u8]) {
-        let new_len = self.len + bytes.len();
-        if new_len > self.capacity {
-            self.reserve(new_len);
+    /// Extends this buffer from a slice of items that can be represented in bytes, increasing its capacity if needed.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::MutableBuffer;
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.extend_from_slice(&[2u32, 0]);
+    /// assert_eq!(buffer.len(), 8) // u32 has 4 bytes
+    /// ```
+    pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) {
+        let len = items.len();
+        let additional = len * std::mem::size_of::<T>();
+        self.reserve(additional);
+        unsafe {
+            let dst = self.data.as_ptr().add(self.len);
+            let src = items.as_ptr() as *const u8;
+            std::ptr::copy_nonoverlapping(src, dst, additional)
         }
+        self.len += additional;
+    }
+
+    /// Extends the buffer with a new item, increasing its capacity if needed.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::MutableBuffer;
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.push(256u32);
+    /// assert_eq!(buffer.len(), 4) // u32 has 4 bytes
+    /// ```
+    #[inline]
+    pub fn push<T: ToByteSlice>(&mut self, item: T) {
+        let additional = std::mem::size_of::<T>();
+        self.reserve(additional);
         unsafe {
-            let dst = NonNull::new_unchecked(self.data.as_ptr().add(self.len));
-            let src = NonNull::new_unchecked(bytes.as_ptr() as *mut u8);
-            memory::memcpy(dst, src, bytes.len());
+            let dst = self.data.as_ptr().add(self.len) as *mut T;
+            std::ptr::write(dst, item);
         }
-        self.len = new_len;
+        self.len += additional;
+    }
+
+    /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
+    #[inline]
+    pub fn extend_zeros(&mut self, additional: usize) {
+        self.resize(self.len + additional, 0);
     }
+}
 
-    /// Extends the buffer by `len` with all bytes equal to `0u8`, incrementing its capacity if needed.
-    pub fn extend(&mut self, len: usize) {
-        let remaining_capacity = self.capacity - self.len;
-        if len > remaining_capacity {
-            self.reserve(self.len + len);
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    /// Extends [`MutableBuffer`] from an [`Iterator`]. If the iterator is an [`ExactSizeIterator`],
+    /// use [`extend_from_exact_iter`] as it is faster.
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+// This implementation is required for two reasons:
+// 1. In stable rust, there is no trait specialization and therefore
+//    we can't specialize `extend_from_iter` for `ExactSizeIterator` like `Vec` does.
+// 2. Extend for `ExactSizeIterator` is at least 2x faster (different scaling).
+// Unfortunately, there is no "FromExactSizeIterator", only `FromIterator`,
+// which means that `collect` is slow and we should avoid it to build a Buffer. This should

Review comment:
       "This should" what :D

##########
File path: rust/arrow/src/buffer.rs
##########
@@ -861,49 +913,147 @@ impl MutableBuffer {
         }
     }
 
-    /// View buffer as typed slice.
+    /// View this buffer asa slice of a specific type.
+    /// # Safety
+    /// This function must only be used when this buffer was extended with items of type `T`.
+    /// Failure to do so results in undefined behavior.
     pub fn typed_data_mut<T: ArrowNativeType>(&mut self) -> &mut [T] {
-        assert_eq!(self.len() % mem::size_of::<T>(), 0);
-        assert!(memory::is_ptr_aligned::<T>(self.data.cast()));
-        // JUSTIFICATION
-        //  Benefit
-        //      Many of the buffers represent specific types, and consumers of `Buffer` often need to re-interpret them.
-        //  Soundness
-        //      * The pointer is non-null by construction
-        //      * alignment asserted above
         unsafe {
-            std::slice::from_raw_parts_mut(
-                self.as_ptr() as *mut T,
-                self.len() / mem::size_of::<T>(),
-            )
+            let (prefix, offsets, suffix) = self.as_slice_mut().align_to_mut::<T>();
+            assert!(prefix.is_empty() && suffix.is_empty());
+            offsets
         }
     }
 
-    /// Extends the buffer from a byte slice, incrementing its capacity if needed.
-    #[inline]
-    pub fn extend_from_slice(&mut self, bytes: &[u8]) {
-        let new_len = self.len + bytes.len();
-        if new_len > self.capacity {
-            self.reserve(new_len);
+    /// Extends this buffer from a slice of items that can be represented in bytes, increasing its capacity if needed.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::MutableBuffer;
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.extend_from_slice(&[2u32, 0]);
+    /// assert_eq!(buffer.len(), 8) // u32 has 4 bytes
+    /// ```
+    pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) {
+        let len = items.len();
+        let additional = len * std::mem::size_of::<T>();
+        self.reserve(additional);
+        unsafe {
+            let dst = self.data.as_ptr().add(self.len);
+            let src = items.as_ptr() as *const u8;
+            std::ptr::copy_nonoverlapping(src, dst, additional)
         }
+        self.len += additional;
+    }
+
+    /// Extends the buffer with a new item, increasing its capacity if needed.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::MutableBuffer;
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.push(256u32);
+    /// assert_eq!(buffer.len(), 4) // u32 has 4 bytes
+    /// ```
+    #[inline]
+    pub fn push<T: ToByteSlice>(&mut self, item: T) {
+        let additional = std::mem::size_of::<T>();
+        self.reserve(additional);
         unsafe {
-            let dst = NonNull::new_unchecked(self.data.as_ptr().add(self.len));
-            let src = NonNull::new_unchecked(bytes.as_ptr() as *mut u8);
-            memory::memcpy(dst, src, bytes.len());
+            let dst = self.data.as_ptr().add(self.len) as *mut T;
+            std::ptr::write(dst, item);
         }
-        self.len = new_len;
+        self.len += additional;
+    }
+
+    /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
+    #[inline]
+    pub fn extend_zeros(&mut self, additional: usize) {
+        self.resize(self.len + additional, 0);
     }
+}
 
-    /// Extends the buffer by `len` with all bytes equal to `0u8`, incrementing its capacity if needed.
-    pub fn extend(&mut self, len: usize) {
-        let remaining_capacity = self.capacity - self.len;
-        if len > remaining_capacity {
-            self.reserve(self.len + len);
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    /// Extends [`MutableBuffer`] from an [`Iterator`]. If the iterator is an [`ExactSizeIterator`],
+    /// use [`extend_from_exact_iter`] as it is faster.
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+// This implementation is required for two reasons:
+// 1. In stable rust, there is no trait specialization and therefore
+//    we can't specialize `extend_from_iter` for `ExactSizeIterator` like `Vec` does.
+// 2. Extend for `ExactSizeIterator` is at least 2x faster (different scaling).
+// Unfortunately, there is no "FromExactSizeIterator", only `FromIterator`,
+// which means that `collect` is slow and we should avoid it to build a Buffer. This should
+impl MutableBuffer {
+    /// This method is private because the public interface is [`extend`]
+    /// It is here because it is similar to `extend_from_exact_iter`.
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        let (lower, _) = iterator.size_hint();
+
+        self.reserve(lower * size);
+        let mut dst = unsafe { self.data.as_ptr().add(self.len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if self.len >= self.capacity() {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                self.reserve_at_least(self.len + additional);

Review comment:
       `reserve_at_least` may be called many times, any chance to calculating the total additional before calling to `reserve_at_least`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763898758






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-762400218


   On my computer this is now faster than simd in master 🤣


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#discussion_r559736271



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -861,49 +913,147 @@ impl MutableBuffer {
         }
     }
 
-    /// View buffer as typed slice.
+    /// View this buffer asa slice of a specific type.
+    /// # Safety
+    /// This function must only be used when this buffer was extended with items of type `T`.
+    /// Failure to do so results in undefined behavior.
     pub fn typed_data_mut<T: ArrowNativeType>(&mut self) -> &mut [T] {
-        assert_eq!(self.len() % mem::size_of::<T>(), 0);
-        assert!(memory::is_ptr_aligned::<T>(self.data.cast()));
-        // JUSTIFICATION
-        //  Benefit
-        //      Many of the buffers represent specific types, and consumers of `Buffer` often need to re-interpret them.
-        //  Soundness
-        //      * The pointer is non-null by construction
-        //      * alignment asserted above
         unsafe {
-            std::slice::from_raw_parts_mut(
-                self.as_ptr() as *mut T,
-                self.len() / mem::size_of::<T>(),
-            )
+            let (prefix, offsets, suffix) = self.as_slice_mut().align_to_mut::<T>();
+            assert!(prefix.is_empty() && suffix.is_empty());
+            offsets
         }
     }
 
-    /// Extends the buffer from a byte slice, incrementing its capacity if needed.
-    #[inline]
-    pub fn extend_from_slice(&mut self, bytes: &[u8]) {
-        let new_len = self.len + bytes.len();
-        if new_len > self.capacity {
-            self.reserve(new_len);
+    /// Extends this buffer from a slice of items that can be represented in bytes, increasing its capacity if needed.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::MutableBuffer;
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.extend_from_slice(&[2u32, 0]);
+    /// assert_eq!(buffer.len(), 8) // u32 has 4 bytes
+    /// ```
+    pub fn extend_from_slice<T: ToByteSlice>(&mut self, items: &[T]) {
+        let len = items.len();
+        let additional = len * std::mem::size_of::<T>();
+        self.reserve(additional);
+        unsafe {
+            let dst = self.data.as_ptr().add(self.len);
+            let src = items.as_ptr() as *const u8;
+            std::ptr::copy_nonoverlapping(src, dst, additional)
         }
+        self.len += additional;
+    }
+
+    /// Extends the buffer with a new item, increasing its capacity if needed.
+    /// # Example
+    /// ```
+    /// # use arrow::buffer::MutableBuffer;
+    /// let mut buffer = MutableBuffer::new(0);
+    /// buffer.push(256u32);
+    /// assert_eq!(buffer.len(), 4) // u32 has 4 bytes
+    /// ```
+    #[inline]
+    pub fn push<T: ToByteSlice>(&mut self, item: T) {
+        let additional = std::mem::size_of::<T>();
+        self.reserve(additional);
         unsafe {
-            let dst = NonNull::new_unchecked(self.data.as_ptr().add(self.len));
-            let src = NonNull::new_unchecked(bytes.as_ptr() as *mut u8);
-            memory::memcpy(dst, src, bytes.len());
+            let dst = self.data.as_ptr().add(self.len) as *mut T;
+            std::ptr::write(dst, item);
         }
-        self.len = new_len;
+        self.len += additional;
+    }
+
+    /// Extends the buffer by `additional` bytes equal to `0u8`, incrementing its capacity if needed.
+    #[inline]
+    pub fn extend_zeros(&mut self, additional: usize) {
+        self.resize(self.len + additional, 0);
     }
+}
 
-    /// Extends the buffer by `len` with all bytes equal to `0u8`, incrementing its capacity if needed.
-    pub fn extend(&mut self, len: usize) {
-        let remaining_capacity = self.capacity - self.len;
-        if len > remaining_capacity {
-            self.reserve(self.len + len);
+impl<A: ArrowNativeType> Extend<A> for MutableBuffer {
+    /// Extends [`MutableBuffer`] from an [`Iterator`]. If the iterator is an [`ExactSizeIterator`],
+    /// use [`extend_from_exact_iter`] as it is faster.
+    #[inline]
+    fn extend<T: IntoIterator<Item = A>>(&mut self, iter: T) {
+        let iterator = iter.into_iter();
+        self.extend_from_iter(iterator)
+    }
+}
+
+// This implementation is required for two reasons:
+// 1. In stable rust, there is no trait specialization and therefore
+//    we can't specialize `extend_from_iter` for `ExactSizeIterator` like `Vec` does.
+// 2. Extend for `ExactSizeIterator` is at least 2x faster (different scaling).
+// Unfortunately, there is no "FromExactSizeIterator", only `FromIterator`,
+// which means that `collect` is slow and we should avoid it to build a Buffer. This should
+impl MutableBuffer {
+    /// This method is private because the public interface is [`extend`]
+    /// It is here because it is similar to `extend_from_exact_iter`.
+    fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
+        &mut self,
+        mut iterator: I,
+    ) {
+        let size = std::mem::size_of::<T>();
+
+        let (lower, _) = iterator.size_hint();
+
+        self.reserve(lower * size);
+        let mut dst = unsafe { self.data.as_ptr().add(self.len) as *mut T };
+
+        while let Some(item) = iterator.next() {
+            if self.len >= self.capacity() {
+                let (lower, _) = iterator.size_hint();
+                let additional = (lower + 1) * size;
+                self.reserve_at_least(self.len + additional);

Review comment:
       I am sorry, I did not understand your question.
   
   This is being called here because iterators do not have a reliable `size_hint` (thus the `+ 1`). This code is spiritually equal to `extend_desugared`  in the standard lib.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] mbrubeck edited a comment on pull request #9235: ARROW-11291: [Rust] Add extend to MutableBuffer (-25% for aritmetic, -97% for length)

Posted by GitBox <gi...@apache.org>.
mbrubeck edited a comment on pull request #9235:
URL: https://github.com/apache/arrow/pull/9235#issuecomment-763846523


   I was able to speed up the fast path of `extend_from_iter` by a few percent, by splitting the loop into two parts.  This version is only about 8% slower than `extend_from_trusted_len_iter`.  (Previously it was about 12% slower.)  Both are still much slower than `extend_from_slice`.
   
   <details>
   
   ```rust
   #[inline]
   fn extend_from_iter<T: ArrowNativeType, I: Iterator<Item = T>>(
       &mut self,
       mut iterator: I,
   ) {
       let size = std::mem::size_of::<T>();
       let (lower, _) = iterator.size_hint();
       let additional = lower * size;
       self.reserve(additional);
   
       // this is necessary because of https://github.com/rust-lang/rust/issues/32155
       let mut len = self.len;
       let mut dst = unsafe { self.data.as_ptr().add(len) as *mut T };
       let capacity = self.capacity;
   
       while len + size <= capacity {
           if let Some(item) = iterator.next() {
               unsafe {
                   std::ptr::write(dst, item);
                   dst = dst.add(1);
               }
               len += size;
           } else {
               break;
           }
       }
       self.len = len;
   
       iterator.for_each(|item| self.push(item));
   }
   ```
   </details>
   
   I haven't benchmarked the slow path (the second loop), which I implemented using safe code.  It's possible that it could be made faster with unsafe code, but I don't expect large gains there.
   
   My complete code including benchmarks is on this branch: https://github.com/apache/arrow/compare/master...mbrubeck:length_faster
   
   EDIT:  I updated my code to remove `SetLenOnDrop`.  I realized it isn't needed, because `T` must be `Copy`, so we don't need to worry about leaking it on panic.  Instead, we can just set the length manually after the loop.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org