You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/28 21:15:54 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #8796: [Rust] [Experiment] Vec vs current allocations

jorgecarleitao opened a new pull request #8796:
URL: https://github.com/apache/arrow/pull/8796


   @nevi-me , @alamb  @jhorstmann , I have been playing around with the buffers on the arrow crate, and just for the fun, tried to replace all our `memory` logic by a simple `Vec<u8>`. Perhaps unsurprisingly to you, but a bit to me, this leads to a significant improvement over almost all benches. I.e. even though memory alignment is good for some kernels, overall our allocations and memory handling seems to be much worse than `Vec`. 
   
   I am not proposing that we drop the alignment over cache lines as it theoretically more sound. However, practically (and based on our microbenchmarks alone), there seems to be a good case here. Maybe this behavior is different if we use `simd` feature gate?
   
   Here are the results ordered from worse to best (results not significant are not shown):
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | nlike_utf8 scalar ends with | 15.8 | 
   | sum nulls 512 | 13.5 | 
   | struct_array_from_vec 1024 | 8.0 | 
   | array_slice 512 | 6.9 | 
   | nlike_utf8 scalar contains | 5.7 | 
   | cast timestamp_ns to timestamp_s 512 | 5.1 | 
   | record_batches_to_csv | 3.9 | 
   | sort nulls 2^12 | 3.5 | 
   | sort nulls 2^10 | 3.0 | 
   | min string 512 | 3.0 | 
   | cast timestamp_ms to i64 512 | 2.3 | 
   | nlike_utf8 scalar equals | 2.1 | 
   | struct_array_from_vec 512 | 1.8 | 
   | take str 1024 | 1.4 | 
   | nlike_utf8 scalar complex | 1.3 | 
   | array_slice 128 | 0.9 | 
   | like_utf8 scalar complex | 0.5 | 
   | like_utf8 scalar contains | 0.5 | 
   | min 512 | -1.0 | 
   | sort 2^12 | -1.1 | 
   | sort 2^10 | -1.1 | 
   | like_utf8 scalar equals | -1.3 | 
   | like_utf8 scalar starts with | -1.4 | 
   | limit 512, 512 | -1.4 | 
   | cast time32s to time32ms 512 | -1.9 | 
   | subtract 512 | -2.2 | 
   | filter context f32 high selectivity | -2.2 | 
   | add 512 | -2.7 | 
   | struct_array_from_vec 256 | -2.7 | 
   | divide_nulls_512 | -2.9 | 
   | sum 512 | -3.0 | 
   | add_nulls_512 | -3.1 | 
   | take str nulls 512 | -3.5 | 
   | multiply 512 | -3.6 | 
   | filter context u8 very low selectivity | -3.9 | 
   | array_slice 2048 | -4.3 | 
   | cast date64 to date32 512 | -4.5 | 
   | take i32 nulls 1024 | -4.8 | 
   | min nulls string 512 | -5.1 | 
   | take i32 1024 | -5.3 | 
   | array_string_from_vec 256 | -5.5 | 
   | array_string_from_vec 128 | -5.7 | 
   | filter context u8 w NULLs very low selectivity | -6.4 | 
   | filter context u8 low selectivity | -6.6 | 
   | filter u8 high selectivity | -7.1 | 
   | filter context u8 w NULLs high selectivity | -7.2 | 
   | filter u8 very low selectivity | -7.4 | 
   | struct_array_from_vec 128 | -7.4 | 
   | cast int64 to int32 512 | -7.8 | 
   | cast date32 to date64 512 | -8.2 | 
   | take i32 nulls 512 | -8.3 | 
   | equal_string_nulls_512 | -8.4 | 
   | take i32 512 | -8.5 | 
   | buffer_bit_ops and | -9.4 | 
   | equal_512 | -9.5 | 
   | take str 512 | -9.6 | 
   | cast time64ns to time32s 512 | -9.7 | 
   | take bool 1024 | -10.2 | 
   | filter context u8 high selectivity | -11.2 | 
   | filter u8 low selectivity | -11.2 | 
   | equal_string_512 | -12.2 | 
   | array_from_vec 256 | -12.5 | 
   | take bool 512 | -12.8 | 
   | cast time32s to time64us 512 | -15.6 | 
   | buffer_bit_ops or | -17.0 | 
   | eq scalar Float32 | -17.6 | 
   | lt_eq scalar Float32 | -17.9 | 
   | lt scalar Float32 | -18.2 | 
   | array_from_vec 512 | -19.4 | 
   | gt_eq scalar Float32 | -19.5 | 
   | take bool nulls 1024 | -19.7 | 
   | lt_eq Float32 | -19.8 | 
   | eq Float32 | -19.9 | 
   | gt_eq Float32 | -20.2 | 
   | filter context u8 w NULLs low selectivity | -20.4 | 
   | neq scalar Float32 | -21.1 | 
   | gt scalar Float32 | -21.5 | 
   | and | -21.8 | 
   | or | -22.1 | 
   | not | -22.6 | 
   | take bool nulls 512 | -22.7 | 
   | cast int32 to int64 512 | -23.0 | 
   | min nulls 512 | -23.2 | 
   | array_from_vec 128 | -23.4 | 
   | cast float64 to uint64 512 | -24.3 | 
   | neq Float32 | -24.8 | 
   | lt Float32 | -24.9 | 
   | gt Float32 | -25.6 | 
   | cast float64 to float32 512 | -25.9 | 
   | cast int32 to float64 512 | -27.6 | 
   | equal_nulls_512 | -28.0 | 
   | cast int32 to uint32 512 | -30.4 | 
   | cast int32 to float32 512 | -33.3 | 
   | cast float32 to int32 512 | -35.0 |


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] TimDiekmann commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

TimDiekmann commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735368785


   > It seems rust will soon get support for [custom allocators for `Vec`](https://github.com/rust-lang/rust/pull/78461), that way we could get both a simplified internal api and still ensure padded allocations using a custom allocator.
   
   Let me hook in here for a moment. Although it is true that custom allocators for `Vec` is now implemented, be aware that this is not yet stable and the API may undergo some changes. Without `#[feature(allocator_api)]` it's not even possible to declare `Vec<T, _>` or ``Vec<T, Global>`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jhorstmann commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jhorstmann commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735366043

I'm surprised too. It might just be the removed assertions, but I did not expect them to have measurable overhead. If you want to investigate further you could try only removing those or replacing with `debug_assert`. I can't easily reproduce it on my notebook since the variations per run are too high, would need to spin up another ec2 instance to run stable benchmarks.

Reading the [columnnar specification][1] again, the alignment is only a recommendation, and only required when serialized. I'm not familiar with that part of the code, but I assume it already needs to ensure the required padding.

The only problem I could see would be with shared memory or FFI if the other side relies on the padding. I think it already can't rely on 64byte alignment, because arrays can be arbitrary slices of the underlying buffers. But relying on padding could happen when accessing data using vector instructions.

It seems rust will soon get support for [custom allocators for `Vec`][2], that way we could get both a simplified internal api and still ensure padded allocations using a custom allocator.

[1]: https://arrow.apache.org/docs/format/Columnar.html#buffer-alignment-and-padding
[2]: https://github.com/rust-lang/rust/pull/78461

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-745124594


   Small update: this is blocked by segfaults coming from the SIMD implementation, as you can see from the logs on the test of the SIMD feature.
   
   I think that they come from the `sum`.
   
   I though that the SIMD implementation would not make assumptions about memory alignment or minimum buffer size. Need some investigation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-745186040


   > @jorgecarleitao I had a quick look at the failing test, that one actually uses the addition kernel to prepare test data and I think that is where the problem is. I'll try to find time to have a deeper look at it today.
   
   Thank you so much, @jhorstmann , really appreciated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] alamb commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

alamb commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-745583654


   Possibly related to https://github.com/apache/arrow/pull/8929


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] Dandandan edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

Dandandan edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-736283187


   @jorgecarleitao 
   Maybe I'm saying something weird/impossible, but would it also be possible/ beneficial to store the buffer in a `Vec<T>`? 
   This way it could simplify mutation of the buffer for the different types, while also relying less on unsafe code / code that could segfault or lead to other errors when using it wrong. In profiling/benchmarks I saw there are mayor inefficiencies related to writing values as individual bytes / instead of being able to store them directly in the builder API (e.g. in the append function).
   
   For the rest, I think it really makes sense to push this idea forward as the current implementation is much more complicated without a good reason. I think using `Vec` it will be actually easier to optimize for performance. I agree with @alamb that getting rid of the other code is beneficial, the benchmarks at least don't show clear regressions.
   
   Really look forward to those benchmarks too @alamb !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ritchie46 edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

ritchie46 edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735378815


   Do these benchmarks also include writing to the buffers, or only allocating/deallocating? 
   
   I ask because I already used some sort of wrapper around a `Vec`. This is just a wrapper around a rust `Vec` that has overwritten `reserve` to use Arrows memory aligned allocation.
   
   However, for writing and indexing it uses the methods default to `Vec`. I found this to be a lot faster for creating Arrow arrays. I use it when I know I don't have null values.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] Dandandan commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

Dandandan commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735379511


   I did some profiling with Valgrind yesterday. I don't have a very good knowledge currently of the workings `memory` but there I saw as well that some of the lines in the code related to `MutableBuffer::extend_from_slice` in terms of instruction cycles as they are called a lot in the code / kernels. I think a few "micro optimizations" are possible there. 
   
   It makes sense to me that switching to `Vec<u8>` benefits from it being optimized/benchmarked already extremely well.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-748470192


   I have now rebased this against master. After @jhorstmann fix to the out of bounds on #8954, it now runs correctly.
   
   Here are the results:
   
   # no SIMD
   
   ```
   git checkout master
   cargo bench --benches
   git checkout buffer2
   cargo bench --benches
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | cast date64 to date32 512 | 63.2 | 
   | cast int32 to float64 512 | 52.4 | 
   | cast float64 to float32 512 | 43.0 | 
   | cast time64ns to time32s 512 | 42.9 | 
   | filter context f32 very low selectivity | 40.5 | 
   | cast date32 to date64 512 | 37.0 | 
   | cast int32 to float32 512 | 35.5 | 
   | struct_array_from_vec 1024 | 32.6 | 
   | lt scalar Float32 | 32.4 | 
   | concat str 1024 | 31.6 | 
   | cast int32 to int64 512 | 29.8 | 
   | cast float64 to uint64 512 | 27.7 | 
   | filter context u8 very low selectivity | 26.8 | 
   | take str 1024 | 26.6 | 
   | concat str nulls 1024 | 26.3 | 
   | take str null indices 1024 | 25.2 | 
   | take str null values 1024 | 25.1 | 
   | struct_array_from_vec 512 | 24.8 | 
   | cast float32 to int32 512 | 20.3 | 
   | filter u8 very low selectivity | 19.7 | 
   | take str null indices 512 | 19.2 | 
   | take str 512 | 18.7 | 
   | cast time32s to time64us 512 | 17.3 | 
   | nlike_utf8 scalar equals | 16.1 | 
   | struct_array_from_vec 256 | 15.7 | 
   | nlike_utf8 scalar ends with | 13.4 | 
   | take i32 nulls 512 | 13.0 | 
   | take i32 512 | 11.7 | 
   | take i32 nulls 1024 | 11.1 | 
   | take str null values null indices 1024 | 10.3 | 
   | take i32 1024 | 10.2 | 
   | filter context u8 low selectivity | 10.0 | 
   | cast int32 to uint32 512 | 9.7 | 
   | filter u8 low selectivity | 9.4 | 
   | filter context u8 w NULLs low selectivity | 7.4 | 
   | min string 512 | 6.9 | 
   | like_utf8 scalar equals | 6.5 | 
   | like_utf8 scalar complex | 6.0 | 
   | filter context f32 high selectivity | 5.6 | 
   | min nulls 512 | 5.3 | 
   | like_utf8 scalar contains | 5.1 | 
   | divide 512 | 4.8 | 
   | nlike_utf8 scalar contains | 4.8 | 
   | concat i32 1024 | 4.5 | 
   | divide_nulls_512 | 4.4 | 
   | struct_array_from_vec 128 | 4.2 | 
   | take bool nulls 512 | 2.2 | 
   | equal_512 | 2.0 | 
   | take bool 512 | 1.4 | 
   | min nulls string 512 | 1.4 | 
   | array_string_from_vec 512 | 1.1 | 
   | nlike_utf8 scalar complex | 0.9 | 
   | gt_eq scalar Float32 | 0.9 | 
   | eq scalar Float32 | 0.7 | 
   | cast int64 to int32 512 | 0.7 | 
   | cast int32 to int32 512 | 0.5 | 
   | neq scalar Float32 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | 0.2 | 
   | gt Float32 | -0.3 | 
   | min 512 | -0.3 | 
   | lt Float32 | -0.4 | 
   | sort nulls 2^12 | -0.4 | 
   | sort 2^12 | -0.6 | 
   | not | -0.8 | 
   | array_string_from_vec 256 | -0.9 | 
   | sort nulls 2^10 | -1.1 | 
   | nlike_utf8 scalar starts with | -1.5 | 
   | max nulls 512 | -1.7 | 
   | array_slice 512 | -1.7 | 
   | filter context u8 w NULLs very low selectivity | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.8 | 
   | length | -2.0 | 
   | and | -2.3 | 
   | or | -2.6 | 
   | add 512 | -2.6 | 
   | like_utf8 scalar starts with | -2.9 | 
   | limit 512, 512 | -3.4 | 
   | multiply 512 | -3.5 | 
   | cast time32s to time32ms 512 | -3.6 | 
   | array_from_vec 512 | -3.7 | 
   | subtract 512 | -3.9 | 
   | take bool nulls 1024 | -4.3 | 
   | add_nulls_512 | -4.9 | 
   | cast timestamp_ms to timestamp_ns 512 | -5.2 | 
   | sum 512 | -5.3 | 
   | like_utf8 scalar ends with | -6.1 | 
   | array_string_from_vec 128 | -6.2 | 
   | filter context f32 low selectivity | -8.2 | 
   | buffer_bit_ops or | -9.0 | 
   | equal_string_nulls_512 | -9.2 | 
   | array_from_vec 256 | -9.3 | 
   | array_from_vec 128 | -10.5 | 
   | equal_string_512 | -12.1 | 
   | filter context u8 high selectivity | -12.2 | 
   | filter u8 high selectivity | -12.8 | 
   | buffer_bit_ops and | -13.0 | 
   | take bool 1024 | -13.3 | 
   | filter context u8 w NULLs high selectivity | -13.3 | 
   | sum nulls 512 | -14.6 | 
   
   # SIMD
   
   ```
   git checkout master
   cargo bench --benches --features simd
   git checkout buffer2
   cargo bench --benches --features simd
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | cast date64 to date32 512 | 64.0 | 
   | cast date32 to date64 512 | 49.5 | 
   | cast float64 to float32 512 | 46.1 | 
   | cast time64ns to time32s 512 | 44.2 | 
   | filter context f32 very low selectivity | 43.9 | 
   | lt scalar Float32 | 39.1 | 
   | lt_eq Float32 | 38.7 | 
   | cast int32 to int64 512 | 35.7 | 
   | struct_array_from_vec 1024 | 35.4 | 
   | lt_eq scalar Float32 | 35.4 | 
   | eq scalar Float32 | 34.2 | 
   | neq Float32 | 32.1 | 
   | cast int32 to float64 512 | 31.6 | 
   | concat str 1024 | 31.4 | 
   | gt Float32 | 30.3 | 
   | neq scalar Float32 | 29.8 | 
   | filter context u8 very low selectivity | 28.2 | 
   | equal_nulls_512 | 27.3 | 
   | eq Float32 | 27.1 | 
   | struct_array_from_vec 512 | 26.1 | 
   | cast float64 to uint64 512 | 25.9 | 
   | lt Float32 | 24.8 | 
   | filter context u8 low selectivity | 24.5 | 
   | filter u8 low selectivity | 24.2 | 
   | gt_eq Float32 | 23.6 | 
   | cast time32s to time64us 512 | 23.6 | 
   | cast float32 to int32 512 | 22.5 | 
   | multiply 512 | 21.2 | 
   | buffer_bit_ops and | 20.3 | 
   | gt_eq scalar Float32 | 20.1 | 
   | take str 1024 | 19.5 | 
   | subtract 512 | 19.5 | 
   | cast int32 to float32 512 | 19.0 | 
   | take str null indices 1024 | 19.0 | 
   | take str null values 1024 | 17.4 | 
   | and | 17.4 | 
   | struct_array_from_vec 256 | 16.8 | 
   | or | 16.1 | 
   | not | 15.8 | 
   | take str 512 | 15.1 | 
   | cast int32 to uint32 512 | 14.8 | 
   | add_nulls_512 | 14.2 | 
   | add 512 | 14.0 | 
   | take str null indices 512 | 13.6 | 
   | filter u8 very low selectivity | 12.9 | 
   | gt scalar Float32 | 12.5 | 
   | take i32 512 | 12.5 | 
   | filter context u8 w NULLs low selectivity | 12.4 | 
   | concat i32 nulls 1024 | 10.5 | 
   | concat str nulls 1024 | 10.1 | 
   | min string 512 | 9.5 | 
   | equal_string_nulls_512 | 9.2 | 
   | take i32 1024 | 8.3 | 
   | take i32 nulls 1024 | 8.0 | 
   | array_slice 2048 | 7.6 | 
   | take i32 nulls 512 | 7.5 | 
   | divide_nulls_512 | 7.1 | 
   | take str null values null indices 1024 | 6.8 | 
   | cast time32s to time32ms 512 | 6.0 | 
   | divide 512 | 5.5 | 
   | array_string_from_vec 512 | 4.6 | 
   | min 512 | 4.6 | 
   | array_slice 512 | 4.4 | 
   | concat i32 1024 | 4.0 | 
   | cast timestamp_ms to timestamp_ns 512 | 3.2 | 
   | struct_array_from_vec 128 | 2.8 | 
   | array_string_from_vec 256 | 2.7 | 
   | filter context f32 high selectivity | 2.6 | 
   | array_slice 128 | 2.5 | 
   | nlike_utf8 scalar complex | 2.2 | 
   | cast timestamp_ms to i64 512 | 1.9 | 
   | like_utf8 scalar complex | 1.8 | 
   | equal_string_512 | 1.6 | 
   | sort nulls 2^10 | 1.6 | 
   | equal_512 | 1.2 | 
   | take bool nulls 512 | 1.1 | 
   | limit 512, 512 | 0.8 | 
   | max nulls 512 | 0.6 | 
   | sum nulls 512 | 0.4 | 
   | min nulls 512 | 0.3 | 
   | max 512 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | -0.2 | 
   | sort 2^12 | -0.3 | 
   | filter context f32 low selectivity | -1.0 | 
   | array_string_from_vec 128 | -1.4 | 
   | sort 2^10 | -1.4 | 
   | cast int64 to int32 512 | -2.6 | 
   | take bool nulls 1024 | -2.9 | 
   | buffer_bit_ops or | -3.2 | 
   | take bool 512 | -4.0 | 
   | filter context u8 high selectivity | -4.5 | 
   | filter context u8 w NULLs very low selectivity | -4.6 | 
   | length | -5.1 | 
   | min nulls string 512 | -5.3 | 
   | nlike_utf8 scalar contains | -5.6 | 
   | take bool 1024 | -6.1 | 
   | filter u8 high selectivity | -6.2 | 
   | array_from_vec 256 | -6.3 | 
   | array_from_vec 512 | -7.0 | 
   | like_utf8 scalar contains | -7.1 | 
   | filter context u8 w NULLs high selectivity | -8.2 | 
   | array_from_vec 128 | -9.3 | 
   | nlike_utf8 scalar starts with | -15.6 | 
   | nlike_utf8 scalar equals | -18.5 | 
   | nlike_utf8 scalar ends with | -20.1 | 
   | like_utf8 scalar ends with | -23.5 | 
   | like_utf8 scalar starts with | -27.8 | 
   | like_utf8 scalar equals | -43.8 | 
   | record_batches_to_csv | -52.9 |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] codecov-io commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

codecov-io commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-743967914


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=h1) Report
   > Merging [#8796](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=desc) (fceb354) into [master](https://codecov.io/gh/apache/arrow/commit/0c8b9903602e1cde0a20b825abf92d361af3c315?el=desc) (0c8b990) will **decrease** coverage by `0.18%`.
   > The diff coverage is `85.48%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8796/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8796      +/-   ##
   ==========================================
   - Coverage   76.77%   76.59%   -0.19%     
   ==========================================
     Files         181      180       -1     
     Lines       41009    40735     -274     
   ==========================================
   - Hits        31485    31201     -284     
   - Misses       9524     9534      +10     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `90.54% <ø> (-1.25%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_union.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfdW5pb24ucnM=) | `87.55% <ø> (-2.24%)` | :arrow_down: |
   | [rust/arrow/src/array/data.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZGF0YS5ycw==) | `93.40% <0.00%> (-3.85%)` | :arrow_down: |
   | [rust/arrow/src/array/raw\_pointer.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvcmF3X3BvaW50ZXIucnM=) | `100.00% <ø> (ø)` | |
   | [rust/arrow/src/array/transform/list.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvdHJhbnNmb3JtL2xpc3QucnM=) | `36.36% <0.00%> (ø)` | |
   | [rust/arrow/src/bitmap.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYml0bWFwLnJz) | `84.74% <0.00%> (-6.78%)` | :arrow_down: |
   | [rust/arrow/src/bytes.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnl0ZXMucnM=) | `41.37% <50.00%> (-12.68%)` | :arrow_down: |
   | [rust/parquet/src/arrow/record\_reader.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9hcnJvdy9yZWNvcmRfcmVhZGVyLnJz) | `96.25% <88.88%> (+1.71%)` | :arrow_up: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.47% <100.00%> (ø)` | |
   | [rust/arrow/src/array/array\_boolean.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYm9vbGVhbi5ycw==) | `86.50% <100.00%> (-0.22%)` | :arrow_down: |
   | ... and [11 more](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=footer). Last update [0c8b990...fceb354](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-743704340


   So, you for a quick update:
   
   * I haven't run the benches for SIMD yet. This requires a nightly run on my computer and I have been forgetting
   * A green CI on this is dependent on removing an unsafe struct on the parquet crate that I have so far been unable to, see #8829 
   
   I was trying to finalize this before the 3.0 but this seems now unlikely as I can't figure out a way to make #8829 CI green.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] Dandandan edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

Dandandan edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-736283187


   @jorgecarleitao 
   Maybe I'm saying something weird/impossible, but would it also be possible/ beneficial to store the (mutable) buffer in a `Vec<T>`? 
   This way it could simplify mutation of the buffer for the different types, while also relying less on unsafe code / code that could segfault or lead to other errors when using it wrong. In profiling/benchmarks I saw there are mayor inefficiencies related to writing values as individual bytes / instead of being able to store them directly in the builder API (e.g. in the append function).
   
   For the rest, I think it really makes sense to push this idea forward as the current implementation is much more complicated without a good reason. I think using `Vec` it will be actually easier to optimize for performance. I agree with @alamb that getting rid of the other code is beneficial, the benchmarks at least don't show clear regressions.
   
   Really look forward to those benchmarks too @alamb !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-737336658


   @alamb I agree with the simplest but no simpler rule.
   
   I also agree with your concerns about the benches being run on ad-hoc hardware. It makes it more difficult to reproduce and draw conclusions.
   
   @Dandandan , I do not think so, but you may have better ideas than me:
   
   The way I currently see it, `ArrayData` is 'array-type'-independent. If we make buffers generic over T, we need to find a way to write `ArrayData`. We could make it logic-dependent, but then we lose the flexibility of a non-generic `ArrayData`, particularly on composite types such as `ListArray`, which have childs of generic types.
   
   One way out would be to make `ArrayData`'s dynamically typed, so that it can hold arbitrary childs, but I think that at some point we will need to downcast them as we will need to extract which type T their buffers contain. This is just my analysis and in no way a definitive answer about this, though ^_^
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-737354190


   @houqp I will benchmark against simd and will post the results on the PR's table ASAP.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735383594

Thanks a lot for all the comments so far.

@jhorstmann really good points. Unfortunately, I do not think it is just the asserts, because `Vec` performs the same asserts as it is also safe code.

wrt to FFI, @jhorstmann and @nevi-me : I don't think FFI can rely on it: as @jhorstmann mentions, since this only a recommendation, implementations must be able to handle non-aligned buffers. The Rust implementation is even funnier here, because the C data interface has no API to export `Buffer::offset` (only `Array::offset`). This implies that we need to offset pointer by `Buffer::offset` when we export to the C data interface (details on #8401). I think that this makes the receiving end unable to determine whether the allocated region is aligned or not.

I think that this `Buffer::offset` may also destroy the benefit of alignment on our own implementation as `ArrayData::data` will output a non-aligned bytes slice whenever `Buffer::offset` is not 0. To use the aligned memory, I think we would need to use the data without the offset, perform the SIMD operation in chunks of 64 bytes _starting at the beginning of the buffer_, and then pass the offset to the new buffer.

@ritchie46 good question. The benchmarks include allocations and mutations, as they cover a wide range of situations.

@Dandandan that is also my current hypothesis: the implementation is competing with some of the brightest minds when we try to re-invent a `Vec`, and the benefits of 64-byte aligned memory do not overcome the benefits of a highly optimized container (`Vec`).

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] houqp commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

houqp commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735606971


   Great simplification indeed :+1:  I have seen conflicting assertions about misaligned access in simd online, from minor overhead to significant performance impact. I am now very curious what the benchmark result will look like with simd feature gate turned on.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] TimDiekmann edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

TimDiekmann edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735368785


   > It seems rust will soon get support for [custom allocators for `Vec`](https://github.com/rust-lang/rust/pull/78461), that way we could get both a simplified internal api and still ensure padded allocations using a custom allocator.
   
   Let me hook in here for a moment. Although it is true that custom allocators for `Vec` is now implemented, be aware that this is not yet stable and the API may undergo some changes. Without `#[feature(allocator_api)]` it's not even possible to declare `Vec<T, _>` or `Vec<T, Global>`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ritchie46 commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

ritchie46 commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735378815


   Does these benchmarks also include writing to the buffers, or only allocating/deallocating? 
   
   I ask because I already used some sort of wrapper around a `Vec`. This is just a wrapper around a rust `Vec` that has overwritten `reserve` to use Arrows memory aligned allocation.
   
   However, for writing and indexing it uses the methods default to `Vec`. I found this to be a lot faster for creating Arrow arrays. I use it when I know I don't have null values.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] codecov-io edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

codecov-io edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-743967914


   # [Codecov](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=h1) Report
   > Merging [#8796](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=desc) (8a1c52c) into [master](https://codecov.io/gh/apache/arrow/commit/091df202ceb586b92882f67577ff720664e63eff?el=desc) (091df20) will **decrease** coverage by `0.12%`.
   > The diff coverage is `83.60%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/8796/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=tree)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #8796      +/-   ##
   ==========================================
   - Coverage   83.22%   83.10%   -0.13%     
   ==========================================
     Files         196      195       -1     
     Lines       48232    47977     -255     
   ==========================================
   - Hits        40142    39869     -273     
   - Misses       8090     8108      +18     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=tree) | Coverage Δ | |
   |---|---|---|
   | [rust/arrow/src/array/array\_primitive.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfcHJpbWl0aXZlLnJz) | `90.79% <ø> (-1.48%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_union.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfdW5pb24ucnM=) | `87.55% <ø> (-2.24%)` | :arrow_down: |
   | [rust/arrow/src/array/data.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvZGF0YS5ycw==) | `93.40% <0.00%> (-3.85%)` | :arrow_down: |
   | [rust/arrow/src/array/raw\_pointer.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvcmF3X3BvaW50ZXIucnM=) | `100.00% <ø> (ø)` | |
   | [rust/arrow/src/bitmap.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYml0bWFwLnJz) | `84.74% <0.00%> (-6.78%)` | :arrow_down: |
   | [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.28% <ø> (ø)` | |
   | [rust/arrow/src/bytes.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnl0ZXMucnM=) | `41.37% <50.00%> (-12.68%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_binary.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYmluYXJ5LnJz) | `90.73% <100.00%> (ø)` | |
   | [rust/arrow/src/array/array\_boolean.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfYm9vbGVhbi5ycw==) | `86.50% <100.00%> (-0.22%)` | :arrow_down: |
   | [rust/arrow/src/array/array\_list.rs](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYXJyYXkvYXJyYXlfbGlzdC5ycw==) | `92.74% <100.00%> (-0.38%)` | :arrow_down: |
   | ... and [11 more](https://codecov.io/gh/apache/arrow/pull/8796/diff?src=pr&el=tree-more) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=continue).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=footer). Last update [091df20...8a1c52c](https://codecov.io/gh/apache/arrow/pull/8796?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jhorstmann commented on a change in pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jhorstmann commented on a change in pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#discussion_r546362442



##########
File path: rust/arrow/src/buffer.rs
##########
@@ -208,15 +158,8 @@ impl Buffer {
 /// allocated memory region.
 impl<T: AsRef<[u8]>> From<T> for Buffer {
     fn from(p: T) -> Self {
-        // allocate aligned memory buffer
-        let slice = p.as_ref();
-        let len = slice.len() * mem::size_of::<u8>();
-        let capacity = bit_util::round_upto_multiple_of_64(len);
-        let buffer = memory::allocate_aligned(capacity);
-        unsafe {
-            memory::memcpy(buffer, slice.as_ptr(), len);
-            Buffer::build_with_arguments(buffer, len, Deallocation::Native(capacity))
-        }
+        let bytes = unsafe { Bytes::new(p.as_ref().to_vec(), Deallocation::Native) };

Review comment:
       There could be potential for further optimization here: `to_vec` has to copy the slice contents, a separate implementation of `From<Vec<u8>>` or `From<Vec<ArrowPrimitiveType>>` could avoid that copy and speed up several kernels involving primitives or list offsets.
   
   As a `From` implementation that would give a "conflicting implementations" error, an explicit `from_vec` method could work. I'd suggest trying it in a separate PR as it could change a bunch of code not directly related to the refactoring in this PR.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-748470192


   I have now rebased this against master. After @jhorstmann fix to the out of bounds on #8954, it now runs correctly.
   
   Here are the results:
   
   # no SIMD
   
   ```
   git checkout master
   cargo bench --benches
   git checkout buffer2
   cargo bench --benches
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | max 512 | 1030.1 | 
   | sum 512 | 972.0 | 
   | min 512 | 928.2 | 
   | max nulls 512 | 298.8 | 
   | min nulls 512 | 284.1 | 
   | gt_eq Float32 | 207.0 | 
   | sum nulls 512 | 201.6 | 
   | eq Float32 | 195.5 | 
   | lt_eq Float32 | 188.5 | 
   | neq scalar Float32 | 163.3 | 
   | divide_nulls_512 | 131.9 | 
   | divide 512 | 130.4 | 
   | lt_eq scalar Float32 | 128.3 | 
   | add 512 | 118.9 | 
   | add_nulls_512 | 115.7 | 
   | record_batches_to_csv | 112.5 | 
   | lt Float32 | 112.2 | 
   | subtract 512 | 107.8 | 
   | neq Float32 | 106.8 | 
   | gt Float32 | 106.4 | 
   | multiply 512 | 105.0 | 
   | eq scalar Float32 | 100.6 | 
   | gt_eq scalar Float32 | 80.8 | 
   | lt scalar Float32 | 66.5 | 
   | cast date64 to date32 512 | 63.2 | 
   | buffer_bit_ops and | 59.8 | 
   | buffer_bit_ops or | 58.2 | 
   | cast int32 to float64 512 | 52.4 | 
   | cast float64 to float32 512 | 43.0 | 
   | cast time64ns to time32s 512 | 42.9 | 
   | cast time32s to time32ms 512 | 41.0 | 
   | filter context f32 very low selectivity | 40.5 | 
   | cast date32 to date64 512 | 37.0 | 
   | gt scalar Float32 | 35.9 | 
   | cast int32 to float32 512 | 35.5 | 
   | struct_array_from_vec 1024 | 32.6 | 
   | lt scalar Float32 | 32.4 | 
   | concat str 1024 | 31.6 | 
   | cast int32 to int64 512 | 29.8 | 
   | cast float64 to uint64 512 | 27.7 | 
   | filter context u8 very low selectivity | 26.8 | 
   | take str 1024 | 26.6 | 
   | concat str nulls 1024 | 26.3 | 
   | take str null indices 1024 | 25.2 | 
   | take str null values 1024 | 25.1 | 
   | struct_array_from_vec 512 | 24.8 | 
   | cast timestamp_ms to timestamp_ns 512 | 20.8 | 
   | cast float32 to int32 512 | 20.3 | 
   | filter u8 very low selectivity | 19.7 | 
   | take str null indices 512 | 19.2 | 
   | take str 512 | 18.7 | 
   | cast time32s to time64us 512 | 17.3 | 
   | nlike_utf8 scalar equals | 16.1 | 
   | take bool 1024 | 15.8 | 
   | struct_array_from_vec 256 | 15.7 | 
   | array_from_vec 128 | 13.6 | 
   | nlike_utf8 scalar ends with | 13.4 | 
   | take i32 nulls 512 | 13.0 | 
   | equal_string_512 | 12.1 | 
   | take i32 512 | 11.7 | 
   | take i32 nulls 1024 | 11.1 | 
   | take str null values null indices 1024 | 10.3 | 
   | take i32 1024 | 10.2 | 
   | filter context u8 low selectivity | 10.0 | 
   | cast int32 to uint32 512 | 9.7 | 
   | filter u8 low selectivity | 9.4 | 
   | array_from_vec 256 | 8.8 | 
   | array_from_vec 512 | 8.0 | 
   | filter context u8 w NULLs high selectivity | 7.9 | 
   | filter u8 high selectivity | 7.9 | 
   | length | 7.5 | 
   | filter context u8 w NULLs low selectivity | 7.4 | 
   | min string 512 | 6.9 | 
   | equal_nulls_512 | 6.7 | 
   | filter context u8 high selectivity | 6.6 | 
   | like_utf8 scalar equals | 6.5 | 
   | array_string_from_vec 128 | 6.5 | 
   | like_utf8 scalar complex | 6.0 | 
   | filter context f32 high selectivity | 5.6 | 
   | min nulls 512 | 5.3 | 
   | like_utf8 scalar starts with | 5.2 | 
   | like_utf8 scalar contains | 5.1 | 
   | divide 512 | 4.8 | 
   | nlike_utf8 scalar contains | 4.8 | 
   | take bool nulls 1024 | 4.6 | 
   | concat i32 1024 | 4.5 | 
   | divide_nulls_512 | 4.4 | 
   | struct_array_from_vec 128 | 4.2 | 
   | min nulls string 512 | 4.2 | 
   | array_string_from_vec 256 | 3.4 | 
   | equal_string_nulls_512 | 2.7 | 
   | and | 2.6 | 
   | or | 2.4 | 
   | filter context u8 w NULLs very low selectivity | 2.3 | 
   | take bool nulls 512 | 2.2 | 
   | equal_512 | 2.0 | 
   | filter context f32 low selectivity | 1.8 | 
   | cast int64 to int32 512 | 1.5 | 
   | sort 2^12 | 1.5 | 
   | take bool 512 | 1.4 | 
   | min nulls string 512 | 1.4 | 
   | limit 512, 512 | 1.4 | 
   | not | 1.2 | 
   | array_string_from_vec 512 | 1.1 | 
   | nlike_utf8 scalar complex | 0.9 | 
   | gt_eq scalar Float32 | 0.9 | 
   | eq scalar Float32 | 0.7 | 
   | cast int64 to int32 512 | 0.7 | 
   | cast int32 to int32 512 | 0.5 | 
   | cast int32 to int32 512 | 0.5 | 
   | array_string_from_vec 512 | 0.5 | 
   | sort 2^10 | 0.4 | 
   | cast timestamp_ns to timestamp_s 512 | 0.4 | 
   | neq scalar Float32 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | 0.2 | 
   | gt Float32 | -0.3 | 
   | min 512 | -0.3 | 
   | lt Float32 | -0.4 | 
   | sort nulls 2^12 | -0.4 | 
   | take bool nulls 512 | -0.6 | 
   | sort 2^12 | -0.6 | 
   | not | -0.8 | 
   | array_string_from_vec 256 | -0.9 | 
   | sort nulls 2^10 | -1.1 | 
   | equal_512 | -1.4 | 
   | array_slice 128 | -1.5 | 
   | nlike_utf8 scalar starts with | -1.5 | 
   | max nulls 512 | -1.7 | 
   | array_slice 512 | -1.7 | 
   | filter context u8 w NULLs very low selectivity | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.9 | 
   | length | -2.0 | 
   | array_slice 512 | -2.0 | 
   | nlike_utf8 scalar complex | -2.1 | 
   | and | -2.3 | 
   | nlike_utf8 scalar contains | -2.5 | 
   | or | -2.6 | 
   | add 512 | -2.6 | 
   | struct_array_from_vec 128 | -2.7 | 
   | like_utf8 scalar starts with | -2.9 | 
   | concat i32 1024 | -3.3 | 
   | like_utf8 scalar contains | -3.4 | 
   | limit 512, 512 | -3.4 | 
   | multiply 512 | -3.5 | 
   | cast time32s to time32ms 512 | -3.6 | 
   | array_from_vec 512 | -3.7 | 
   | sort nulls 2^12 | -3.8 | 
   | subtract 512 | -3.9 | 
   | filter context f32 high selectivity | -4.0 | 
   | take bool nulls 1024 | -4.3 | 
   | add_nulls_512 | -4.9 | 
   | array_slice 2048 | -5.0 | 
   | like_utf8 scalar complex | -5.1 | 
   | cast timestamp_ms to timestamp_ns 512 | -5.2 | 
   | sum 512 | -5.3 | 
   | min string 512 | -5.5 | 
   | like_utf8 scalar ends with | -6.1 | 
   | array_string_from_vec 128 | -6.2 | 
   | concat i32 nulls 1024 | -6.2 | 
   | take i32 1024 | -6.5 | 
   | cast time32s to time64us 512 | -6.6 | 
   | take i32 nulls 1024 | -7.1 | 
   | take i32 nulls 512 | -7.2 | 
   | like_utf8 scalar ends with | -7.6 | 
   | like_utf8 scalar equals | -7.7 | 
   | filter context f32 low selectivity | -8.2 | 
   | nlike_utf8 scalar starts with | -8.6 | 
   | take str null values null indices 1024 | -8.7 | 
   | buffer_bit_ops or | -9.0 | 
   | equal_string_nulls_512 | -9.2 | 
   | array_from_vec 256 | -9.3 | 
   | cast int32 to uint32 512 | -9.3 | 
   | nlike_utf8 scalar ends with | -9.8 | 
   | take i32 512 | -9.9 | 
   | filter context u8 w NULLs low selectivity | -10.4 | 
   | array_from_vec 128 | -10.5 | 
   | equal_string_512 | -12.1 | 
   | filter context u8 high selectivity | -12.2 | 
   | struct_array_from_vec 256 | -12.7 | 
   | filter u8 high selectivity | -12.8 | 
   | buffer_bit_ops and | -13.0 | 
   | take bool 1024 | -13.3 | 
   | filter context u8 w NULLs high selectivity | -13.3 | 
   | take str null indices 512 | -14.4 | 
   | take str 512 | -14.5 | 
   | sum nulls 512 | -14.6 | 
   | filter u8 very low selectivity | -15.9 | 
   | concat str nulls 1024 | -18.0 | 
   | take str null values 1024 | -18.9 | 
   | take str null indices 1024 | -18.9 | 
   | cast float32 to int32 512 | -19.0 | 
   | take str 1024 | -19.7 | 
   | struct_array_from_vec 512 | -19.8 | 
   | cast int32 to int64 512 | -20.6 | 
   | filter context u8 very low selectivity | -21.1 | 
   | cast float64 to uint64 512 | -21.1 | 
   | cast int32 to float32 512 | -24.2 | 
   | struct_array_from_vec 1024 | -24.3 | 
   | nlike_utf8 scalar equals | -24.9 | 
   | filter u8 low selectivity | -26.1 | 
   | filter context u8 low selectivity | -26.3 | 
   | concat str 1024 | -27.6 | 
   | cast float64 to float32 512 | -27.6 | 
   | cast date32 to date64 512 | -28.1 | 
   | cast time64ns to time32s 512 | -29.5 | 
   | filter context f32 very low selectivity | -30.1 | 
   | cast int32 to float64 512 | -33.4 | 
   | cast date64 to date32 512 | -37.9 | 
   
   
   Jorges-MBP-2-2:arrow jorgecarleitao$ python3 -m parse
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | cast date64 to date32 512 | 63.2 | 
   | cast int32 to float64 512 | 52.4 | 
   | cast float64 to float32 512 | 43.0 | 
   | cast time64ns to time32s 512 | 42.9 | 
   | filter context f32 very low selectivity | 40.5 | 
   | cast date32 to date64 512 | 37.0 | 
   | cast int32 to float32 512 | 35.5 | 
   | struct_array_from_vec 1024 | 32.6 | 
   | lt scalar Float32 | 32.4 | 
   | concat str 1024 | 31.6 | 
   | cast int32 to int64 512 | 29.8 | 
   | cast float64 to uint64 512 | 27.7 | 
   | filter context u8 very low selectivity | 26.8 | 
   | take str 1024 | 26.6 | 
   | concat str nulls 1024 | 26.3 | 
   | take str null indices 1024 | 25.2 | 
   | take str null values 1024 | 25.1 | 
   | struct_array_from_vec 512 | 24.8 | 
   | cast float32 to int32 512 | 20.3 | 
   | filter u8 very low selectivity | 19.7 | 
   | take str null indices 512 | 19.2 | 
   | take str 512 | 18.7 | 
   | cast time32s to time64us 512 | 17.3 | 
   | nlike_utf8 scalar equals | 16.1 | 
   | struct_array_from_vec 256 | 15.7 | 
   | nlike_utf8 scalar ends with | 13.4 | 
   | take i32 nulls 512 | 13.0 | 
   | take i32 512 | 11.7 | 
   | take i32 nulls 1024 | 11.1 | 
   | take str null values null indices 1024 | 10.3 | 
   | take i32 1024 | 10.2 | 
   | filter context u8 low selectivity | 10.0 | 
   | cast int32 to uint32 512 | 9.7 | 
   | filter u8 low selectivity | 9.4 | 
   | filter context u8 w NULLs low selectivity | 7.4 | 
   | min string 512 | 6.9 | 
   | like_utf8 scalar equals | 6.5 | 
   | like_utf8 scalar complex | 6.0 | 
   | filter context f32 high selectivity | 5.6 | 
   | min nulls 512 | 5.3 | 
   | like_utf8 scalar contains | 5.1 | 
   | divide 512 | 4.8 | 
   | nlike_utf8 scalar contains | 4.8 | 
   | concat i32 1024 | 4.5 | 
   | divide_nulls_512 | 4.4 | 
   | struct_array_from_vec 128 | 4.2 | 
   | take bool nulls 512 | 2.2 | 
   | equal_512 | 2.0 | 
   | take bool 512 | 1.4 | 
   | min nulls string 512 | 1.4 | 
   | array_string_from_vec 512 | 1.1 | 
   | nlike_utf8 scalar complex | 0.9 | 
   | gt_eq scalar Float32 | 0.9 | 
   | eq scalar Float32 | 0.7 | 
   | cast int64 to int32 512 | 0.7 | 
   | cast int32 to int32 512 | 0.5 | 
   | neq scalar Float32 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | 0.2 | 
   | gt Float32 | -0.3 | 
   | min 512 | -0.3 | 
   | lt Float32 | -0.4 | 
   | sort nulls 2^12 | -0.4 | 
   | sort 2^12 | -0.6 | 
   | not | -0.8 | 
   | array_string_from_vec 256 | -0.9 | 
   | sort nulls 2^10 | -1.1 | 
   | nlike_utf8 scalar starts with | -1.5 | 
   | max nulls 512 | -1.7 | 
   | array_slice 512 | -1.7 | 
   | filter context u8 w NULLs very low selectivity | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.8 | 
   | length | -2.0 | 
   | and | -2.3 | 
   | or | -2.6 | 
   | add 512 | -2.6 | 
   | like_utf8 scalar starts with | -2.9 | 
   | limit 512, 512 | -3.4 | 
   | multiply 512 | -3.5 | 
   | cast time32s to time32ms 512 | -3.6 | 
   | array_from_vec 512 | -3.7 | 
   | subtract 512 | -3.9 | 
   | take bool nulls 1024 | -4.3 | 
   | add_nulls_512 | -4.9 | 
   | cast timestamp_ms to timestamp_ns 512 | -5.2 | 
   | sum 512 | -5.3 | 
   | like_utf8 scalar ends with | -6.1 | 
   | array_string_from_vec 128 | -6.2 | 
   | filter context f32 low selectivity | -8.2 | 
   | buffer_bit_ops or | -9.0 | 
   | equal_string_nulls_512 | -9.2 | 
   | array_from_vec 256 | -9.3 | 
   | array_from_vec 128 | -10.5 | 
   | equal_string_512 | -12.1 | 
   | filter context u8 high selectivity | -12.2 | 
   | filter u8 high selectivity | -12.8 | 
   | buffer_bit_ops and | -13.0 | 
   | take bool 1024 | -13.3 | 
   | filter context u8 w NULLs high selectivity | -13.3 | 
   | sum nulls 512 | -14.6 |
   
   # SIMD
   
   ```
   git checkout master
   cargo bench --benches --features simd
   git checkout buffer2
   cargo bench --benches --features simd
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | cast date64 to date32 512 | 64.0 | 
   | cast date32 to date64 512 | 49.5 | 
   | cast float64 to float32 512 | 46.1 | 
   | cast time64ns to time32s 512 | 44.2 | 
   | filter context f32 very low selectivity | 43.9 | 
   | lt scalar Float32 | 39.1 | 
   | lt_eq Float32 | 38.7 | 
   | cast int32 to int64 512 | 35.7 | 
   | struct_array_from_vec 1024 | 35.4 | 
   | lt_eq scalar Float32 | 35.4 | 
   | eq scalar Float32 | 34.2 | 
   | neq Float32 | 32.1 | 
   | cast int32 to float64 512 | 31.6 | 
   | concat str 1024 | 31.4 | 
   | gt Float32 | 30.3 | 
   | neq scalar Float32 | 29.8 | 
   | filter context u8 very low selectivity | 28.2 | 
   | equal_nulls_512 | 27.3 | 
   | eq Float32 | 27.1 | 
   | struct_array_from_vec 512 | 26.1 | 
   | cast float64 to uint64 512 | 25.9 | 
   | lt Float32 | 24.8 | 
   | filter context u8 low selectivity | 24.5 | 
   | filter u8 low selectivity | 24.2 | 
   | gt_eq Float32 | 23.6 | 
   | cast time32s to time64us 512 | 23.6 | 
   | cast float32 to int32 512 | 22.5 | 
   | multiply 512 | 21.2 | 
   | buffer_bit_ops and | 20.3 | 
   | gt_eq scalar Float32 | 20.1 | 
   | take str 1024 | 19.5 | 
   | subtract 512 | 19.5 | 
   | cast int32 to float32 512 | 19.0 | 
   | take str null indices 1024 | 19.0 | 
   | take str null values 1024 | 17.4 | 
   | and | 17.4 | 
   | struct_array_from_vec 256 | 16.8 | 
   | or | 16.1 | 
   | not | 15.8 | 
   | take str 512 | 15.1 | 
   | cast int32 to uint32 512 | 14.8 | 
   | add_nulls_512 | 14.2 | 
   | add 512 | 14.0 | 
   | take str null indices 512 | 13.6 | 
   | filter u8 very low selectivity | 12.9 | 
   | gt scalar Float32 | 12.5 | 
   | take i32 512 | 12.5 | 
   | filter context u8 w NULLs low selectivity | 12.4 | 
   | concat i32 nulls 1024 | 10.5 | 
   | concat str nulls 1024 | 10.1 | 
   | min string 512 | 9.5 | 
   | equal_string_nulls_512 | 9.2 | 
   | take i32 1024 | 8.3 | 
   | take i32 nulls 1024 | 8.0 | 
   | array_slice 2048 | 7.6 | 
   | take i32 nulls 512 | 7.5 | 
   | divide_nulls_512 | 7.1 | 
   | take str null values null indices 1024 | 6.8 | 
   | cast time32s to time32ms 512 | 6.0 | 
   | divide 512 | 5.5 | 
   | array_string_from_vec 512 | 4.6 | 
   | min 512 | 4.6 | 
   | array_slice 512 | 4.4 | 
   | concat i32 1024 | 4.0 | 
   | cast timestamp_ms to timestamp_ns 512 | 3.2 | 
   | struct_array_from_vec 128 | 2.8 | 
   | array_string_from_vec 256 | 2.7 | 
   | filter context f32 high selectivity | 2.6 | 
   | array_slice 128 | 2.5 | 
   | nlike_utf8 scalar complex | 2.2 | 
   | cast timestamp_ms to i64 512 | 1.9 | 
   | like_utf8 scalar complex | 1.8 | 
   | equal_string_512 | 1.6 | 
   | sort nulls 2^10 | 1.6 | 
   | equal_512 | 1.2 | 
   | take bool nulls 512 | 1.1 | 
   | limit 512, 512 | 0.8 | 
   | max nulls 512 | 0.6 | 
   | sum nulls 512 | 0.4 | 
   | min nulls 512 | 0.3 | 
   | max 512 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | -0.2 | 
   | sort 2^12 | -0.3 | 
   | filter context f32 low selectivity | -1.0 | 
   | array_string_from_vec 128 | -1.4 | 
   | sort 2^10 | -1.4 | 
   | cast int64 to int32 512 | -2.6 | 
   | take bool nulls 1024 | -2.9 | 
   | buffer_bit_ops or | -3.2 | 
   | take bool 512 | -4.0 | 
   | filter context u8 high selectivity | -4.5 | 
   | filter context u8 w NULLs very low selectivity | -4.6 | 
   | length | -5.1 | 
   | min nulls string 512 | -5.3 | 
   | nlike_utf8 scalar contains | -5.6 | 
   | take bool 1024 | -6.1 | 
   | filter u8 high selectivity | -6.2 | 
   | array_from_vec 256 | -6.3 | 
   | array_from_vec 512 | -7.0 | 
   | like_utf8 scalar contains | -7.1 | 
   | filter context u8 w NULLs high selectivity | -8.2 | 
   | array_from_vec 128 | -9.3 | 
   | nlike_utf8 scalar starts with | -15.6 | 
   | nlike_utf8 scalar equals | -18.5 | 
   | nlike_utf8 scalar ends with | -20.1 | 
   | like_utf8 scalar ends with | -23.5 | 
   | like_utf8 scalar starts with | -27.8 | 
   | like_utf8 scalar equals | -43.8 | 
   | record_batches_to_csv | -52.9 |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

github-actions[bot] commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735291963


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] ritchie46 commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

ritchie46 commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-737477886


   > ArrayData is 'array-type'-independent. If we make buffers generic over T, we need to find a way to write ArrayData
   
   Would it perhaps be possible to have a hybrid solution? The buffer remains typeless `Vec<u8>`, but the public API exposes generic typed methods like `Buffer::push::<T>()`. Then some code regarding type conversion to bytes, alignment etc. could be abstracted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-754843578


   I am closing this in favor of #9076 where the performance is fixed.
   
   The gist is that there were two reasons for the performance issues:
   
   1. we were using `std::alloc::alloc_zero`  instead of `std::alloc::alloc`
   2. we were converting everything to a byte slice instead of writing directly to the buffer
   
   That PR addresses them both and brings the `MutableBuffer` to be faster than `Vec`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] nevi-me commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

nevi-me commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-735362452


   > Perhaps unsurprisingly to you, but a bit to me, this leads to a significant improvement over almost all benches
   
   You can consider me surprised too.
   
   This is very interesting, can one then say that the alignment requirements from `arrow::memory` are the main cause of the difference? If we only enforce alignment at boundaries like IPC and FFI, could we still be able to use `Vec<u8>` internally? I don't think it should also be much of an issue for Parquet, as we currently materialise Arrow data into the primitives that Parquet supports, due to the way arrays are indexed for definition levels.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jhorstmann commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jhorstmann commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-745155714


   @jorgecarleitao I had a quick look at the failing test, that one actually uses the addition kernel to prepare test data and I think that is where the problem is. I'll try to find time to have a deeper look at it today.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-748470192


   I have now rebased this against master. After @jhorstmann fix to the out of bounds on #8954, it now runs correctly.
   
   Here are the results:
   
   # no SIMD
   
   ```
   git checkout master
   cargo bench --benches
   git checkout buffer2
   cargo bench --benches
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | max 512 | 1030.1 | 
   | sum 512 | 972.0 | 
   | min 512 | 928.2 | 
   | max nulls 512 | 298.8 | 
   | min nulls 512 | 284.1 | 
   | gt_eq Float32 | 207.0 | 
   | sum nulls 512 | 201.6 | 
   | eq Float32 | 195.5 | 
   | lt_eq Float32 | 188.5 | 
   | neq scalar Float32 | 163.3 | 
   | divide_nulls_512 | 131.9 | 
   | divide 512 | 130.4 | 
   | lt_eq scalar Float32 | 128.3 | 
   | add 512 | 118.9 | 
   | add_nulls_512 | 115.7 | 
   | record_batches_to_csv | 112.5 | 
   | lt Float32 | 112.2 | 
   | subtract 512 | 107.8 | 
   | neq Float32 | 106.8 | 
   | gt Float32 | 106.4 | 
   | multiply 512 | 105.0 | 
   | eq scalar Float32 | 100.6 | 
   | gt_eq scalar Float32 | 80.8 | 
   | lt scalar Float32 | 66.5 | 
   | cast date64 to date32 512 | 63.2 | 
   | buffer_bit_ops and | 59.8 | 
   | buffer_bit_ops or | 58.2 | 
   | cast int32 to float64 512 | 52.4 | 
   | cast float64 to float32 512 | 43.0 | 
   | cast time64ns to time32s 512 | 42.9 | 
   | cast time32s to time32ms 512 | 41.0 | 
   | filter context f32 very low selectivity | 40.5 | 
   | cast date32 to date64 512 | 37.0 | 
   | gt scalar Float32 | 35.9 | 
   | cast int32 to float32 512 | 35.5 | 
   | struct_array_from_vec 1024 | 32.6 | 
   | lt scalar Float32 | 32.4 | 
   | concat str 1024 | 31.6 | 
   | cast int32 to int64 512 | 29.8 | 
   | cast float64 to uint64 512 | 27.7 | 
   | filter context u8 very low selectivity | 26.8 | 
   | take str 1024 | 26.6 | 
   | concat str nulls 1024 | 26.3 | 
   | take str null indices 1024 | 25.2 | 
   | take str null values 1024 | 25.1 | 
   | struct_array_from_vec 512 | 24.8 | 
   | cast timestamp_ms to timestamp_ns 512 | 20.8 | 
   | cast float32 to int32 512 | 20.3 | 
   | filter u8 very low selectivity | 19.7 | 
   | take str null indices 512 | 19.2 | 
   | take str 512 | 18.7 | 
   | cast time32s to time64us 512 | 17.3 | 
   | nlike_utf8 scalar equals | 16.1 | 
   | take bool 1024 | 15.8 | 
   | struct_array_from_vec 256 | 15.7 | 
   | array_from_vec 128 | 13.6 | 
   | nlike_utf8 scalar ends with | 13.4 | 
   | take i32 nulls 512 | 13.0 | 
   | equal_string_512 | 12.1 | 
   | take i32 512 | 11.7 | 
   | take i32 nulls 1024 | 11.1 | 
   | take str null values null indices 1024 | 10.3 | 
   | take i32 1024 | 10.2 | 
   | filter context u8 low selectivity | 10.0 | 
   | cast int32 to uint32 512 | 9.7 | 
   | filter u8 low selectivity | 9.4 | 
   | array_from_vec 256 | 8.8 | 
   | array_from_vec 512 | 8.0 | 
   | filter context u8 w NULLs high selectivity | 7.9 | 
   | filter u8 high selectivity | 7.9 | 
   | length | 7.5 | 
   | filter context u8 w NULLs low selectivity | 7.4 | 
   | min string 512 | 6.9 | 
   | equal_nulls_512 | 6.7 | 
   | filter context u8 high selectivity | 6.6 | 
   | like_utf8 scalar equals | 6.5 | 
   | array_string_from_vec 128 | 6.5 | 
   | like_utf8 scalar complex | 6.0 | 
   | filter context f32 high selectivity | 5.6 | 
   | min nulls 512 | 5.3 | 
   | like_utf8 scalar starts with | 5.2 | 
   | like_utf8 scalar contains | 5.1 | 
   | divide 512 | 4.8 | 
   | nlike_utf8 scalar contains | 4.8 | 
   | take bool nulls 1024 | 4.6 | 
   | concat i32 1024 | 4.5 | 
   | divide_nulls_512 | 4.4 | 
   | struct_array_from_vec 128 | 4.2 | 
   | min nulls string 512 | 4.2 | 
   | array_string_from_vec 256 | 3.4 | 
   | equal_string_nulls_512 | 2.7 | 
   | and | 2.6 | 
   | or | 2.4 | 
   | filter context u8 w NULLs very low selectivity | 2.3 | 
   | take bool nulls 512 | 2.2 | 
   | equal_512 | 2.0 | 
   | filter context f32 low selectivity | 1.8 | 
   | cast int64 to int32 512 | 1.5 | 
   | sort 2^12 | 1.5 | 
   | take bool 512 | 1.4 | 
   | min nulls string 512 | 1.4 | 
   | limit 512, 512 | 1.4 | 
   | not | 1.2 | 
   | array_string_from_vec 512 | 1.1 | 
   | nlike_utf8 scalar complex | 0.9 | 
   | gt_eq scalar Float32 | 0.9 | 
   | eq scalar Float32 | 0.7 | 
   | cast int64 to int32 512 | 0.7 | 
   | cast int32 to int32 512 | 0.5 | 
   | cast int32 to int32 512 | 0.5 | 
   | array_string_from_vec 512 | 0.5 | 
   | sort 2^10 | 0.4 | 
   | cast timestamp_ns to timestamp_s 512 | 0.4 | 
   | neq scalar Float32 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | 0.2 | 
   | gt Float32 | -0.3 | 
   | min 512 | -0.3 | 
   | lt Float32 | -0.4 | 
   | sort nulls 2^12 | -0.4 | 
   | take bool nulls 512 | -0.6 | 
   | sort 2^12 | -0.6 | 
   | not | -0.8 | 
   | array_string_from_vec 256 | -0.9 | 
   | sort nulls 2^10 | -1.1 | 
   | equal_512 | -1.4 | 
   | array_slice 128 | -1.5 | 
   | nlike_utf8 scalar starts with | -1.5 | 
   | max nulls 512 | -1.7 | 
   | array_slice 512 | -1.7 | 
   | filter context u8 w NULLs very low selectivity | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.9 | 
   | length | -2.0 | 
   | array_slice 512 | -2.0 | 
   | nlike_utf8 scalar complex | -2.1 | 
   | and | -2.3 | 
   | nlike_utf8 scalar contains | -2.5 | 
   | or | -2.6 | 
   | add 512 | -2.6 | 
   | struct_array_from_vec 128 | -2.7 | 
   | like_utf8 scalar starts with | -2.9 | 
   | concat i32 1024 | -3.3 | 
   | like_utf8 scalar contains | -3.4 | 
   | limit 512, 512 | -3.4 | 
   | multiply 512 | -3.5 | 
   | cast time32s to time32ms 512 | -3.6 | 
   | array_from_vec 512 | -3.7 | 
   | sort nulls 2^12 | -3.8 | 
   | subtract 512 | -3.9 | 
   | filter context f32 high selectivity | -4.0 | 
   | take bool nulls 1024 | -4.3 | 
   | add_nulls_512 | -4.9 | 
   | array_slice 2048 | -5.0 | 
   | like_utf8 scalar complex | -5.1 | 
   | cast timestamp_ms to timestamp_ns 512 | -5.2 | 
   | sum 512 | -5.3 | 
   | min string 512 | -5.5 | 
   | like_utf8 scalar ends with | -6.1 | 
   | array_string_from_vec 128 | -6.2 | 
   | concat i32 nulls 1024 | -6.2 | 
   | take i32 1024 | -6.5 | 
   | cast time32s to time64us 512 | -6.6 | 
   | take i32 nulls 1024 | -7.1 | 
   | take i32 nulls 512 | -7.2 | 
   | like_utf8 scalar ends with | -7.6 | 
   | like_utf8 scalar equals | -7.7 | 
   | filter context f32 low selectivity | -8.2 | 
   | nlike_utf8 scalar starts with | -8.6 | 
   | take str null values null indices 1024 | -8.7 | 
   | buffer_bit_ops or | -9.0 | 
   | equal_string_nulls_512 | -9.2 | 
   | array_from_vec 256 | -9.3 | 
   | cast int32 to uint32 512 | -9.3 | 
   | nlike_utf8 scalar ends with | -9.8 | 
   | take i32 512 | -9.9 | 
   | filter context u8 w NULLs low selectivity | -10.4 | 
   | array_from_vec 128 | -10.5 | 
   | equal_string_512 | -12.1 | 
   | filter context u8 high selectivity | -12.2 | 
   | struct_array_from_vec 256 | -12.7 | 
   | filter u8 high selectivity | -12.8 | 
   | buffer_bit_ops and | -13.0 | 
   | take bool 1024 | -13.3 | 
   | filter context u8 w NULLs high selectivity | -13.3 | 
   | take str null indices 512 | -14.4 | 
   | take str 512 | -14.5 | 
   | sum nulls 512 | -14.6 | 
   | filter u8 very low selectivity | -15.9 | 
   | concat str nulls 1024 | -18.0 | 
   | take str null values 1024 | -18.9 | 
   | take str null indices 1024 | -18.9 | 
   | cast float32 to int32 512 | -19.0 | 
   | take str 1024 | -19.7 | 
   | struct_array_from_vec 512 | -19.8 | 
   | cast int32 to int64 512 | -20.6 | 
   | filter context u8 very low selectivity | -21.1 | 
   | cast float64 to uint64 512 | -21.1 | 
   | cast int32 to float32 512 | -24.2 | 
   | struct_array_from_vec 1024 | -24.3 | 
   | nlike_utf8 scalar equals | -24.9 | 
   | filter u8 low selectivity | -26.1 | 
   | filter context u8 low selectivity | -26.3 | 
   | concat str 1024 | -27.6 | 
   | cast float64 to float32 512 | -27.6 | 
   | cast date32 to date64 512 | -28.1 | 
   | cast time64ns to time32s 512 | -29.5 | 
   | filter context f32 very low selectivity | -30.1 | 
   | cast int32 to float64 512 | -33.4 | 
   | cast date64 to date32 512 | -37.9 |
   
   # SIMD
   
   ```
   git checkout master
   cargo bench --benches --features simd
   git checkout buffer2
   cargo bench --benches --features simd
   ```
   
   ```
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | like_utf8 scalar equals | 78.2 | 
   | cast date64 to date32 512 | 64.0 | 
   | cast date32 to date64 512 | 49.5 | 
   | cast float64 to float32 512 | 46.1 | 
   | nlike_utf8 scalar starts with | 44.7 | 
   | cast time64ns to time32s 512 | 44.2 | 
   | filter context f32 very low selectivity | 43.9 | 
   | lt scalar Float32 | 39.1 | 
   | lt_eq Float32 | 38.7 | 
   | like_utf8 scalar starts with | 38.4 | 
   | cast int32 to int64 512 | 35.7 | 
   | struct_array_from_vec 1024 | 35.4 | 
   | lt_eq scalar Float32 | 35.4 | 
   | eq scalar Float32 | 34.2 | 
   | neq Float32 | 32.1 | 
   | cast int32 to float64 512 | 31.6 | 
   | concat str 1024 | 31.4 | 
   | gt Float32 | 30.3 | 
   | neq scalar Float32 | 29.8 | 
   | like_utf8 scalar ends with | 29.0 | 
   | filter context u8 very low selectivity | 28.2 | 
   | equal_nulls_512 | 27.3 | 
   | eq Float32 | 27.1 | 
   | struct_array_from_vec 512 | 26.1 | 
   | cast float64 to uint64 512 | 25.9 | 
   | nlike_utf8 scalar ends with | 25.4 | 
   | lt Float32 | 24.8 | 
   | filter context u8 low selectivity | 24.5 | 
   | filter u8 low selectivity | 24.2 | 
   | gt_eq Float32 | 23.6 | 
   | cast time32s to time64us 512 | 23.6 | 
   | nlike_utf8 scalar equals | 23.2 | 
   | cast float32 to int32 512 | 22.5 | 
   | multiply 512 | 21.2 | 
   | buffer_bit_ops and | 20.3 | 
   | gt_eq scalar Float32 | 20.1 | 
   | take str 1024 | 19.5 | 
   | subtract 512 | 19.5 | 
   | cast int32 to float32 512 | 19.0 | 
   | take str null indices 1024 | 19.0 | 
   | take str null values 1024 | 17.4 | 
   | and | 17.4 | 
   | struct_array_from_vec 256 | 16.8 | 
   | or | 16.1 | 
   | not | 15.8 | 
   | take str 512 | 15.1 | 
   | cast int32 to uint32 512 | 14.8 | 
   | add_nulls_512 | 14.2 | 
   | add 512 | 14.0 | 
   | take str null indices 512 | 13.6 | 
   | filter u8 very low selectivity | 12.9 | 
   | filter context u8 w NULLs high selectivity | 12.5 | 
   | gt scalar Float32 | 12.5 | 
   | take i32 512 | 12.5 | 
   | filter context u8 w NULLs low selectivity | 12.4 | 
   | array_from_vec 128 | 10.6 | 
   | concat i32 nulls 1024 | 10.5 | 
   | concat str nulls 1024 | 10.1 | 
   | min string 512 | 9.5 | 
   | equal_string_nulls_512 | 9.2 | 
   | array_from_vec 256 | 9.0 | 
   | take i32 1024 | 8.3 | 
   | filter u8 high selectivity | 8.0 | 
   | take i32 nulls 1024 | 8.0 | 
   | take bool 512 | 7.7 | 
   | array_slice 2048 | 7.6 | 
   | take i32 nulls 512 | 7.5 | 
   | take bool 1024 | 7.2 | 
   | divide_nulls_512 | 7.1 | 
   | take str null values null indices 1024 | 6.8 | 
   | like_utf8 scalar contains | 6.2 | 
   | nlike_utf8 scalar contains | 6.2 | 
   | cast time32s to time32ms 512 | 6.0 | 
   | length | 5.9 | 
   | divide 512 | 5.5 | 
   | array_from_vec 512 | 5.4 | 
   | filter context u8 w NULLs very low selectivity | 5.2 | 
   | filter context u8 high selectivity | 4.8 | 
   | array_string_from_vec 512 | 4.6 | 
   | min nulls string 512 | 4.6 | 
   | min 512 | 4.6 | 
   | buffer_bit_ops or | 4.6 | 
   | sort nulls 2^12 | 4.5 | 
   | array_slice 512 | 4.4 | 
   | concat i32 1024 | 4.0 | 
   | equal_string_512 | 3.5 | 
   | cast timestamp_ms to timestamp_ns 512 | 3.2 | 
   | struct_array_from_vec 128 | 2.8 | 
   | array_string_from_vec 256 | 2.7 | 
   | filter context f32 high selectivity | 2.6 | 
   | array_slice 128 | 2.5 | 
   | limit 512, 512 | 2.2 | 
   | nlike_utf8 scalar complex | 2.2 | 
   | filter context f32 low selectivity | 2.0 | 
   | cast timestamp_ms to i64 512 | 1.9 | 
   | cast int64 to int32 512 | 1.9 | 
   | sort 2^10 | 1.8 | 
   | like_utf8 scalar complex | 1.8 | 
   | equal_string_512 | 1.6 | 
   | sort nulls 2^10 | 1.6 | 
   | equal_512 | 1.2 | 
   | take bool nulls 512 | 1.1 | 
   | array_string_from_vec 128 | 1.1 | 
   | limit 512, 512 | 0.8 | 
   | max nulls 512 | 0.6 | 
   | sort 2^12 | 0.4 | 
   | sum nulls 512 | 0.4 | 
   | min nulls 512 | 0.3 | 
   | max 512 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | -0.2 | 
   | sort 2^12 | -0.3 | 
   | cast int32 to int32 512 | -0.5 | 
   | sum nulls 512 | -0.6 | 
   | like_utf8 scalar complex | -0.6 | 
   | min nulls 512 | -0.9 | 
   | filter context f32 low selectivity | -1.0 | 
   | sort nulls 2^10 | -1.0 | 
   | equal_512 | -1.0 | 
   | array_slice 2048 | -1.1 | 
   | nlike_utf8 scalar complex | -1.1 | 
   | array_string_from_vec 128 | -1.4 | 
   | sort 2^10 | -1.4 | 
   | max nulls 512 | -1.8 | 
   | max 512 | -1.9 | 
   | filter context f32 high selectivity | -2.1 | 
   | take bool nulls 1024 | -2.2 | 
   | concat i32 nulls 1024 | -2.5 | 
   | cast int64 to int32 512 | -2.6 | 
   | take bool nulls 1024 | -2.9 | 
   | buffer_bit_ops or | -3.2 | 
   | array_slice 128 | -3.3 | 
   | array_string_from_vec 256 | -3.7 | 
   | take bool 512 | -4.0 | 
   | array_slice 512 | -4.4 | 
   | filter context u8 high selectivity | -4.5 | 
   | struct_array_from_vec 128 | -4.5 | 
   | filter context u8 w NULLs very low selectivity | -4.6 | 
   | concat i32 1024 | -4.7 | 
   | length | -5.1 | 
   | array_string_from_vec 512 | -5.1 | 
   | min nulls string 512 | -5.3 | 
   | cast time32s to time32ms 512 | -5.3 | 
   | nlike_utf8 scalar contains | -5.6 | 
   | take bool 1024 | -6.1 | 
   | filter u8 high selectivity | -6.2 | 
   | array_from_vec 256 | -6.3 | 
   | equal_string_nulls_512 | -6.7 | 
   | array_from_vec 512 | -7.0 | 
   | min string 512 | -7.0 | 
   | cast timestamp_ms to timestamp_ns 512 | -7.1 | 
   | like_utf8 scalar contains | -7.1 | 
   | take i32 1024 | -7.4 | 
   | take i32 nulls 1024 | -7.7 | 
   | take str null values null indices 1024 | -8.0 | 
   | filter context u8 w NULLs high selectivity | -8.2 | 
   | take i32 nulls 512 | -8.4 | 
   | take i32 512 | -9.1 | 
   | filter u8 very low selectivity | -9.2 | 
   | divide_nulls_512 | -9.2 | 
   | array_from_vec 128 | -9.3 | 
   | divide 512 | -9.4 | 
   | concat str nulls 1024 | -9.4 | 
   | filter context u8 w NULLs low selectivity | -10.7 | 
   | take str null indices 512 | -12.5 | 
   | add 512 | -14.0 | 
   | or | -14.1 | 
   | not | -14.4 | 
   | gt scalar Float32 | -14.6 | 
   | cast int32 to float32 512 | -14.6 | 
   | take str 512 | -14.6 | 
   | struct_array_from_vec 256 | -14.7 | 
   | buffer_bit_ops and | -14.8 | 
   | add_nulls_512 | -14.9 | 
   | and | -15.4 | 
   | take str null values 1024 | -15.5 | 
   | nlike_utf8 scalar starts with | -15.6 | 
   | subtract 512 | -16.0 | 
   | cast int32 to uint32 512 | -16.8 | 
   | gt_eq scalar Float32 | -17.0 | 
   | take str null indices 1024 | -17.3 | 
   | take str 1024 | -17.3 | 
   | cast float32 to int32 512 | -17.6 | 
   | filter context u8 very low selectivity | -18.3 | 
   | multiply 512 | -18.3 | 
   | nlike_utf8 scalar equals | -18.5 | 
   | filter u8 low selectivity | -19.3 | 
   | filter context u8 low selectivity | -19.3 | 
   | concat str 1024 | -19.6 | 
   | lt Float32 | -19.9 | 
   | nlike_utf8 scalar ends with | -20.1 | 
   | gt_eq Float32 | -20.1 | 
   | cast time32s to time64us 512 | -20.2 | 
   | struct_array_from_vec 512 | -20.3 | 
   | equal_nulls_512 | -20.6 | 
   | eq Float32 | -21.3 | 
   | cast float64 to uint64 512 | -21.5 | 
   | cast int32 to float64 512 | -22.5 | 
   | gt Float32 | -23.3 | 
   | like_utf8 scalar ends with | -23.5 | 
   | neq scalar Float32 | -23.5 | 
   | take bool nulls 512 | -23.8 | 
   | neq Float32 | -24.1 | 
   | lt_eq Float32 | -24.4 | 
   | lt_eq scalar Float32 | -25.9 | 
   | eq scalar Float32 | -26.3 | 
   | cast int32 to int64 512 | -26.4 | 
   | struct_array_from_vec 1024 | -27.6 | 
   | like_utf8 scalar starts with | -27.8 | 
   | lt scalar Float32 | -28.1 | 
   | filter context f32 very low selectivity | -30.4 | 
   | cast time64ns to time32s 512 | -30.6 | 
   | cast date32 to date64 512 | -32.2 | 
   | cast float64 to float32 512 | -33.2 | 
   | cast date64 to date32 512 | -39.4 | 
   | like_utf8 scalar equals | -43.8 | 
   | record_batches_to_csv | -52.9 |
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao closed pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao closed pull request #8796:
URL: https://github.com/apache/arrow/pull/8796


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] Dandandan commented on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

Dandandan commented on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-736283187


   @jorgecarleitao 
   Maybe I'm saying something weird/impossible, but would it also be possible/ beneficial to store the buffer in a `Vec<T>`? 
   This way it could simplify mutation of the buffer for the different types, while also relying less on unsafe code / code that could segfault or lead to other errors when using it wrong. In profiling/benchmarks I saw there are mayor inefficiencies related to writing values as individual bytes / instead of being able to store them directly in the builder API (e.g. in the append function).
   
   For the rest, I think it really makes sense to push this idea forward as the current implementation is much more complicated without a good reason. I think using `Vec` it will be actually easier to optimize.
   
   Really look forward to those benchmarks too @alamb 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao closed pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao closed pull request #8796:
URL: https://github.com/apache/arrow/pull/8796


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8796: [Rust] [Experiment] Vec vs current allocations

Posted by GitBox <gi...@apache.org>.

jorgecarleitao edited a comment on pull request #8796:
URL: https://github.com/apache/arrow/pull/8796#issuecomment-748470192


   I have now rebased this against master. After @jhorstmann fix to the out of bounds on #8954, it now runs correctly.
   
   Here are the results:
   
   # no SIMD
   
   ```
   git checkout master
   cargo bench --benches
   git checkout buffer2
   cargo bench --benches
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | max 512 | 1030.1 | 
   | sum 512 | 972.0 | 
   | min 512 | 928.2 | 
   | max nulls 512 | 298.8 | 
   | min nulls 512 | 284.1 | 
   | gt_eq Float32 | 207.0 | 
   | sum nulls 512 | 201.6 | 
   | eq Float32 | 195.5 | 
   | lt_eq Float32 | 188.5 | 
   | neq scalar Float32 | 163.3 | 
   | divide_nulls_512 | 131.9 | 
   | divide 512 | 130.4 | 
   | lt_eq scalar Float32 | 128.3 | 
   | add 512 | 118.9 | 
   | add_nulls_512 | 115.7 | 
   | record_batches_to_csv | 112.5 | 
   | lt Float32 | 112.2 | 
   | subtract 512 | 107.8 | 
   | neq Float32 | 106.8 | 
   | gt Float32 | 106.4 | 
   | multiply 512 | 105.0 | 
   | eq scalar Float32 | 100.6 | 
   | gt_eq scalar Float32 | 80.8 | 
   | lt scalar Float32 | 66.5 | 
   | cast date64 to date32 512 | 63.2 | 
   | buffer_bit_ops and | 59.8 | 
   | buffer_bit_ops or | 58.2 | 
   | cast int32 to float64 512 | 52.4 | 
   | cast float64 to float32 512 | 43.0 | 
   | cast time64ns to time32s 512 | 42.9 | 
   | cast time32s to time32ms 512 | 41.0 | 
   | filter context f32 very low selectivity | 40.5 | 
   | cast date32 to date64 512 | 37.0 | 
   | gt scalar Float32 | 35.9 | 
   | cast int32 to float32 512 | 35.5 | 
   | struct_array_from_vec 1024 | 32.6 | 
   | lt scalar Float32 | 32.4 | 
   | concat str 1024 | 31.6 | 
   | cast int32 to int64 512 | 29.8 | 
   | cast float64 to uint64 512 | 27.7 | 
   | filter context u8 very low selectivity | 26.8 | 
   | take str 1024 | 26.6 | 
   | concat str nulls 1024 | 26.3 | 
   | take str null indices 1024 | 25.2 | 
   | take str null values 1024 | 25.1 | 
   | struct_array_from_vec 512 | 24.8 | 
   | cast timestamp_ms to timestamp_ns 512 | 20.8 | 
   | cast float32 to int32 512 | 20.3 | 
   | filter u8 very low selectivity | 19.7 | 
   | take str null indices 512 | 19.2 | 
   | take str 512 | 18.7 | 
   | cast time32s to time64us 512 | 17.3 | 
   | nlike_utf8 scalar equals | 16.1 | 
   | take bool 1024 | 15.8 | 
   | struct_array_from_vec 256 | 15.7 | 
   | array_from_vec 128 | 13.6 | 
   | nlike_utf8 scalar ends with | 13.4 | 
   | take i32 nulls 512 | 13.0 | 
   | equal_string_512 | 12.1 | 
   | take i32 512 | 11.7 | 
   | take i32 nulls 1024 | 11.1 | 
   | take str null values null indices 1024 | 10.3 | 
   | take i32 1024 | 10.2 | 
   | filter context u8 low selectivity | 10.0 | 
   | cast int32 to uint32 512 | 9.7 | 
   | filter u8 low selectivity | 9.4 | 
   | array_from_vec 256 | 8.8 | 
   | array_from_vec 512 | 8.0 | 
   | filter context u8 w NULLs high selectivity | 7.9 | 
   | filter u8 high selectivity | 7.9 | 
   | length | 7.5 | 
   | filter context u8 w NULLs low selectivity | 7.4 | 
   | min string 512 | 6.9 | 
   | equal_nulls_512 | 6.7 | 
   | filter context u8 high selectivity | 6.6 | 
   | like_utf8 scalar equals | 6.5 | 
   | array_string_from_vec 128 | 6.5 | 
   | like_utf8 scalar complex | 6.0 | 
   | filter context f32 high selectivity | 5.6 | 
   | min nulls 512 | 5.3 | 
   | like_utf8 scalar starts with | 5.2 | 
   | like_utf8 scalar contains | 5.1 | 
   | divide 512 | 4.8 | 
   | nlike_utf8 scalar contains | 4.8 | 
   | take bool nulls 1024 | 4.6 | 
   | concat i32 1024 | 4.5 | 
   | divide_nulls_512 | 4.4 | 
   | struct_array_from_vec 128 | 4.2 | 
   | min nulls string 512 | 4.2 | 
   | array_string_from_vec 256 | 3.4 | 
   | equal_string_nulls_512 | 2.7 | 
   | and | 2.6 | 
   | or | 2.4 | 
   | filter context u8 w NULLs very low selectivity | 2.3 | 
   | take bool nulls 512 | 2.2 | 
   | equal_512 | 2.0 | 
   | filter context f32 low selectivity | 1.8 | 
   | cast int64 to int32 512 | 1.5 | 
   | sort 2^12 | 1.5 | 
   | take bool 512 | 1.4 | 
   | min nulls string 512 | 1.4 | 
   | limit 512, 512 | 1.4 | 
   | not | 1.2 | 
   | array_string_from_vec 512 | 1.1 | 
   | nlike_utf8 scalar complex | 0.9 | 
   | gt_eq scalar Float32 | 0.9 | 
   | eq scalar Float32 | 0.7 | 
   | cast int64 to int32 512 | 0.7 | 
   | cast int32 to int32 512 | 0.5 | 
   | cast int32 to int32 512 | 0.5 | 
   | array_string_from_vec 512 | 0.5 | 
   | sort 2^10 | 0.4 | 
   | cast timestamp_ns to timestamp_s 512 | 0.4 | 
   | neq scalar Float32 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | 0.2 | 
   | gt Float32 | -0.3 | 
   | min 512 | -0.3 | 
   | lt Float32 | -0.4 | 
   | sort nulls 2^12 | -0.4 | 
   | take bool nulls 512 | -0.6 | 
   | sort 2^12 | -0.6 | 
   | not | -0.8 | 
   | array_string_from_vec 256 | -0.9 | 
   | sort nulls 2^10 | -1.1 | 
   | equal_512 | -1.4 | 
   | array_slice 128 | -1.5 | 
   | nlike_utf8 scalar starts with | -1.5 | 
   | max nulls 512 | -1.7 | 
   | array_slice 512 | -1.7 | 
   | filter context u8 w NULLs very low selectivity | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.8 | 
   | cast timestamp_ms to i64 512 | -1.9 | 
   | length | -2.0 | 
   | array_slice 512 | -2.0 | 
   | nlike_utf8 scalar complex | -2.1 | 
   | and | -2.3 | 
   | nlike_utf8 scalar contains | -2.5 | 
   | or | -2.6 | 
   | add 512 | -2.6 | 
   | struct_array_from_vec 128 | -2.7 | 
   | like_utf8 scalar starts with | -2.9 | 
   | concat i32 1024 | -3.3 | 
   | like_utf8 scalar contains | -3.4 | 
   | limit 512, 512 | -3.4 | 
   | multiply 512 | -3.5 | 
   | cast time32s to time32ms 512 | -3.6 | 
   | array_from_vec 512 | -3.7 | 
   | sort nulls 2^12 | -3.8 | 
   | subtract 512 | -3.9 | 
   | filter context f32 high selectivity | -4.0 | 
   | take bool nulls 1024 | -4.3 | 
   | add_nulls_512 | -4.9 | 
   | array_slice 2048 | -5.0 | 
   | like_utf8 scalar complex | -5.1 | 
   | cast timestamp_ms to timestamp_ns 512 | -5.2 | 
   | sum 512 | -5.3 | 
   | min string 512 | -5.5 | 
   | like_utf8 scalar ends with | -6.1 | 
   | array_string_from_vec 128 | -6.2 | 
   | concat i32 nulls 1024 | -6.2 | 
   | take i32 1024 | -6.5 | 
   | cast time32s to time64us 512 | -6.6 | 
   | take i32 nulls 1024 | -7.1 | 
   | take i32 nulls 512 | -7.2 | 
   | like_utf8 scalar ends with | -7.6 | 
   | like_utf8 scalar equals | -7.7 | 
   | filter context f32 low selectivity | -8.2 | 
   | nlike_utf8 scalar starts with | -8.6 | 
   | take str null values null indices 1024 | -8.7 | 
   | buffer_bit_ops or | -9.0 | 
   | equal_string_nulls_512 | -9.2 | 
   | array_from_vec 256 | -9.3 | 
   | cast int32 to uint32 512 | -9.3 | 
   | nlike_utf8 scalar ends with | -9.8 | 
   | take i32 512 | -9.9 | 
   | filter context u8 w NULLs low selectivity | -10.4 | 
   | array_from_vec 128 | -10.5 | 
   | equal_string_512 | -12.1 | 
   | filter context u8 high selectivity | -12.2 | 
   | struct_array_from_vec 256 | -12.7 | 
   | filter u8 high selectivity | -12.8 | 
   | buffer_bit_ops and | -13.0 | 
   | take bool 1024 | -13.3 | 
   | filter context u8 w NULLs high selectivity | -13.3 | 
   | take str null indices 512 | -14.4 | 
   | take str 512 | -14.5 | 
   | sum nulls 512 | -14.6 | 
   | filter u8 very low selectivity | -15.9 | 
   | concat str nulls 1024 | -18.0 | 
   | take str null values 1024 | -18.9 | 
   | take str null indices 1024 | -18.9 | 
   | cast float32 to int32 512 | -19.0 | 
   | take str 1024 | -19.7 | 
   | struct_array_from_vec 512 | -19.8 | 
   | cast int32 to int64 512 | -20.6 | 
   | filter context u8 very low selectivity | -21.1 | 
   | cast float64 to uint64 512 | -21.1 | 
   | cast int32 to float32 512 | -24.2 | 
   | struct_array_from_vec 1024 | -24.3 | 
   | nlike_utf8 scalar equals | -24.9 | 
   | filter u8 low selectivity | -26.1 | 
   | filter context u8 low selectivity | -26.3 | 
   | concat str 1024 | -27.6 | 
   | cast float64 to float32 512 | -27.6 | 
   | cast date32 to date64 512 | -28.1 | 
   | cast time64ns to time32s 512 | -29.5 | 
   | filter context f32 very low selectivity | -30.1 | 
   | cast int32 to float64 512 | -33.4 | 
   | cast date64 to date32 512 | -37.9 |
   
   # SIMD
   
   ```
   git checkout master
   cargo bench --benches --features simd
   git checkout buffer2
   cargo bench --benches --features simd
   ```
   
   |  benchmark | variation (%) |
   |-------------- | -------------- | 
   | like_utf8 scalar equals | 78.2 | 
   | cast date64 to date32 512 | 64.0 | 
   | cast date32 to date64 512 | 49.5 | 
   | cast float64 to float32 512 | 46.1 | 
   | nlike_utf8 scalar starts with | 44.7 | 
   | cast time64ns to time32s 512 | 44.2 | 
   | filter context f32 very low selectivity | 43.9 | 
   | lt scalar Float32 | 39.1 | 
   | lt_eq Float32 | 38.7 | 
   | like_utf8 scalar starts with | 38.4 | 
   | cast int32 to int64 512 | 35.7 | 
   | struct_array_from_vec 1024 | 35.4 | 
   | lt_eq scalar Float32 | 35.4 | 
   | eq scalar Float32 | 34.2 | 
   | neq Float32 | 32.1 | 
   | cast int32 to float64 512 | 31.6 | 
   | concat str 1024 | 31.4 | 
   | gt Float32 | 30.3 | 
   | neq scalar Float32 | 29.8 | 
   | like_utf8 scalar ends with | 29.0 | 
   | filter context u8 very low selectivity | 28.2 | 
   | equal_nulls_512 | 27.3 | 
   | eq Float32 | 27.1 | 
   | struct_array_from_vec 512 | 26.1 | 
   | cast float64 to uint64 512 | 25.9 | 
   | nlike_utf8 scalar ends with | 25.4 | 
   | lt Float32 | 24.8 | 
   | filter context u8 low selectivity | 24.5 | 
   | filter u8 low selectivity | 24.2 | 
   | gt_eq Float32 | 23.6 | 
   | cast time32s to time64us 512 | 23.6 | 
   | nlike_utf8 scalar equals | 23.2 | 
   | cast float32 to int32 512 | 22.5 | 
   | multiply 512 | 21.2 | 
   | buffer_bit_ops and | 20.3 | 
   | gt_eq scalar Float32 | 20.1 | 
   | take str 1024 | 19.5 | 
   | subtract 512 | 19.5 | 
   | cast int32 to float32 512 | 19.0 | 
   | take str null indices 1024 | 19.0 | 
   | take str null values 1024 | 17.4 | 
   | and | 17.4 | 
   | struct_array_from_vec 256 | 16.8 | 
   | or | 16.1 | 
   | not | 15.8 | 
   | take str 512 | 15.1 | 
   | cast int32 to uint32 512 | 14.8 | 
   | add_nulls_512 | 14.2 | 
   | add 512 | 14.0 | 
   | take str null indices 512 | 13.6 | 
   | filter u8 very low selectivity | 12.9 | 
   | filter context u8 w NULLs high selectivity | 12.5 | 
   | gt scalar Float32 | 12.5 | 
   | take i32 512 | 12.5 | 
   | filter context u8 w NULLs low selectivity | 12.4 | 
   | array_from_vec 128 | 10.6 | 
   | concat i32 nulls 1024 | 10.5 | 
   | concat str nulls 1024 | 10.1 | 
   | min string 512 | 9.5 | 
   | equal_string_nulls_512 | 9.2 | 
   | array_from_vec 256 | 9.0 | 
   | take i32 1024 | 8.3 | 
   | filter u8 high selectivity | 8.0 | 
   | take i32 nulls 1024 | 8.0 | 
   | take bool 512 | 7.7 | 
   | array_slice 2048 | 7.6 | 
   | take i32 nulls 512 | 7.5 | 
   | take bool 1024 | 7.2 | 
   | divide_nulls_512 | 7.1 | 
   | take str null values null indices 1024 | 6.8 | 
   | like_utf8 scalar contains | 6.2 | 
   | nlike_utf8 scalar contains | 6.2 | 
   | cast time32s to time32ms 512 | 6.0 | 
   | length | 5.9 | 
   | divide 512 | 5.5 | 
   | array_from_vec 512 | 5.4 | 
   | filter context u8 w NULLs very low selectivity | 5.2 | 
   | filter context u8 high selectivity | 4.8 | 
   | array_string_from_vec 512 | 4.6 | 
   | min nulls string 512 | 4.6 | 
   | min 512 | 4.6 | 
   | buffer_bit_ops or | 4.6 | 
   | sort nulls 2^12 | 4.5 | 
   | array_slice 512 | 4.4 | 
   | concat i32 1024 | 4.0 | 
   | equal_string_512 | 3.5 | 
   | cast timestamp_ms to timestamp_ns 512 | 3.2 | 
   | struct_array_from_vec 128 | 2.8 | 
   | array_string_from_vec 256 | 2.7 | 
   | filter context f32 high selectivity | 2.6 | 
   | array_slice 128 | 2.5 | 
   | limit 512, 512 | 2.2 | 
   | nlike_utf8 scalar complex | 2.2 | 
   | filter context f32 low selectivity | 2.0 | 
   | cast timestamp_ms to i64 512 | 1.9 | 
   | cast int64 to int32 512 | 1.9 | 
   | sort 2^10 | 1.8 | 
   | like_utf8 scalar complex | 1.8 | 
   | equal_string_512 | 1.6 | 
   | sort nulls 2^10 | 1.6 | 
   | equal_512 | 1.2 | 
   | take bool nulls 512 | 1.1 | 
   | array_string_from_vec 128 | 1.1 | 
   | limit 512, 512 | 0.8 | 
   | max nulls 512 | 0.6 | 
   | sort 2^12 | 0.4 | 
   | sum nulls 512 | 0.4 | 
   | min nulls 512 | 0.3 | 
   | max 512 | 0.3 | 
   | cast timestamp_ns to timestamp_s 512 | -0.2 | 
   | sort 2^12 | -0.3 | 
   | cast int32 to int32 512 | -0.5 | 
   | sum nulls 512 | -0.6 | 
   | like_utf8 scalar complex | -0.6 | 
   | min nulls 512 | -0.9 | 
   | filter context f32 low selectivity | -1.0 | 
   | sort nulls 2^10 | -1.0 | 
   | equal_512 | -1.0 | 
   | array_slice 2048 | -1.1 | 
   | nlike_utf8 scalar complex | -1.1 | 
   | array_string_from_vec 128 | -1.4 | 
   | sort 2^10 | -1.4 | 
   | max nulls 512 | -1.8 | 
   | max 512 | -1.9 | 
   | filter context f32 high selectivity | -2.1 | 
   | take bool nulls 1024 | -2.2 | 
   | concat i32 nulls 1024 | -2.5 | 
   | cast int64 to int32 512 | -2.6 | 
   | take bool nulls 1024 | -2.9 | 
   | buffer_bit_ops or | -3.2 | 
   | array_slice 128 | -3.3 | 
   | array_string_from_vec 256 | -3.7 | 
   | take bool 512 | -4.0 | 
   | array_slice 512 | -4.4 | 
   | filter context u8 high selectivity | -4.5 | 
   | struct_array_from_vec 128 | -4.5 | 
   | filter context u8 w NULLs very low selectivity | -4.6 | 
   | concat i32 1024 | -4.7 | 
   | length | -5.1 | 
   | array_string_from_vec 512 | -5.1 | 
   | min nulls string 512 | -5.3 | 
   | cast time32s to time32ms 512 | -5.3 | 
   | nlike_utf8 scalar contains | -5.6 | 
   | take bool 1024 | -6.1 | 
   | filter u8 high selectivity | -6.2 | 
   | array_from_vec 256 | -6.3 | 
   | equal_string_nulls_512 | -6.7 | 
   | array_from_vec 512 | -7.0 | 
   | min string 512 | -7.0 | 
   | cast timestamp_ms to timestamp_ns 512 | -7.1 | 
   | like_utf8 scalar contains | -7.1 | 
   | take i32 1024 | -7.4 | 
   | take i32 nulls 1024 | -7.7 | 
   | take str null values null indices 1024 | -8.0 | 
   | filter context u8 w NULLs high selectivity | -8.2 | 
   | take i32 nulls 512 | -8.4 | 
   | take i32 512 | -9.1 | 
   | filter u8 very low selectivity | -9.2 | 
   | divide_nulls_512 | -9.2 | 
   | array_from_vec 128 | -9.3 | 
   | divide 512 | -9.4 | 
   | concat str nulls 1024 | -9.4 | 
   | filter context u8 w NULLs low selectivity | -10.7 | 
   | take str null indices 512 | -12.5 | 
   | add 512 | -14.0 | 
   | or | -14.1 | 
   | not | -14.4 | 
   | gt scalar Float32 | -14.6 | 
   | cast int32 to float32 512 | -14.6 | 
   | take str 512 | -14.6 | 
   | struct_array_from_vec 256 | -14.7 | 
   | buffer_bit_ops and | -14.8 | 
   | add_nulls_512 | -14.9 | 
   | and | -15.4 | 
   | take str null values 1024 | -15.5 | 
   | nlike_utf8 scalar starts with | -15.6 | 
   | subtract 512 | -16.0 | 
   | cast int32 to uint32 512 | -16.8 | 
   | gt_eq scalar Float32 | -17.0 | 
   | take str null indices 1024 | -17.3 | 
   | take str 1024 | -17.3 | 
   | cast float32 to int32 512 | -17.6 | 
   | filter context u8 very low selectivity | -18.3 | 
   | multiply 512 | -18.3 | 
   | nlike_utf8 scalar equals | -18.5 | 
   | filter u8 low selectivity | -19.3 | 
   | filter context u8 low selectivity | -19.3 | 
   | concat str 1024 | -19.6 | 
   | lt Float32 | -19.9 | 
   | nlike_utf8 scalar ends with | -20.1 | 
   | gt_eq Float32 | -20.1 | 
   | cast time32s to time64us 512 | -20.2 | 
   | struct_array_from_vec 512 | -20.3 | 
   | equal_nulls_512 | -20.6 | 
   | eq Float32 | -21.3 | 
   | cast float64 to uint64 512 | -21.5 | 
   | cast int32 to float64 512 | -22.5 | 
   | gt Float32 | -23.3 | 
   | like_utf8 scalar ends with | -23.5 | 
   | neq scalar Float32 | -23.5 | 
   | take bool nulls 512 | -23.8 | 
   | neq Float32 | -24.1 | 
   | lt_eq Float32 | -24.4 | 
   | lt_eq scalar Float32 | -25.9 | 
   | eq scalar Float32 | -26.3 | 
   | cast int32 to int64 512 | -26.4 | 
   | struct_array_from_vec 1024 | -27.6 | 
   | like_utf8 scalar starts with | -27.8 | 
   | lt scalar Float32 | -28.1 | 
   | filter context f32 very low selectivity | -30.4 | 
   | cast time64ns to time32s 512 | -30.6 | 
   | cast date32 to date64 512 | -32.2 | 
   | cast float64 to float32 512 | -33.2 | 
   | cast date64 to date32 512 | -39.4 | 
   | like_utf8 scalar equals | -43.8 | 
   | record_batches_to_csv | -52.9 |
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org