You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/23 18:38:03 UTC
[GitHub] [arrow] tyrelr opened a new pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
tyrelr opened a new pull request #9304:
URL: https://github.com/apache/arrow/pull/9304
create a from_iter implementation for Buffer to collect boolean flags into a buffer that will:
* pre-allocate the appropriate size based on iterator site_hint()
* 'buffer' a byte in-memory until it's complete (at which point it is written)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] Dandandan commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
Dandandan commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563279700
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
I think you could merge the `iterator.next()` loops here.
and maybe do the bit masking loop like this:
```rust
loop {
let mut byte_accum = 0;
let mut mask = 1;
while mask <= 0b10000000 {
if let Some(value) = iterator.next() {
byte_accum |= match value {
true => mask,
false => 0,
};
mask <<= 1;
} else {
exhausted = true;
break;
}
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-766189316
Big things I haven't tried that MIGHT give some gains would be:
* manually unrolling the loop for the last 7 bits
* relying on iter<Item=[bool;8]> in the API (but that would require callers to construct a slice of bools rather than an iter of bools)
** related to this, I have no idea if there are simd primitives that could be of use here
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563393453
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
After making that change, it seems like this could be achieved without requiring any changes to the previous Buffer API's: The byte-accumulation loop could be exposed as a PackedBoolIterator that wraps an Iter<bool>, exposing it as Iter<u8>(). I'm not sure if that would a better or worse approach...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
codecov-io commented on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-766160749
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=h1) Report
> Merging [#9304](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=desc) (0e75531) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.01%`.
> The diff coverage is `88.37%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9304/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9304 +/- ##
==========================================
+ Coverage 81.84% 81.85% +0.01%
==========================================
Files 215 215
Lines 52949 53035 +86
==========================================
+ Hits 43336 43412 +76
- Misses 9613 9623 +10
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9304/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `92.84% <88.09%> (-0.84%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/9304/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.00% <100.00%> (+0.01%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=footer). Last update [67d0c2e...0e75531](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-766177143
The 10% impacts are:
```
critcmp master-67d0c2e3 bool-38d4e395 -t 10
group bool-38d4e395 master-67d0c2e3
----- ------------- ---------------
add_nulls_512 1.30 343.7±0.62ns ? B/sec 1.00 264.4±1.80ns ? B/sec
cast float32 to int32 512 1.13 3.2±0.01µs ? B/sec 1.00 2.8±0.01µs ? B/sec
concat str nulls 1024 1.10 21.1±0.06µs ? B/sec 1.00 19.1±0.11µs ? B/sec
eq Float32 1.00 89.0±0.10µs ? B/sec 1.42 126.0±0.28µs ? B/sec
eq scalar Float32 1.11 78.0±0.12µs ? B/sec 1.00 70.2±0.11µs ? B/sec
gt scalar Float32 1.36 71.8±0.15µs ? B/sec 1.00 52.7±0.09µs ? B/sec
gt_eq Float32 1.00 75.5±0.12µs ? B/sec 1.65 124.6±0.22µs ? B/sec
lt scalar Float32 1.15 71.1±0.15µs ? B/sec 1.00 62.0±0.51µs ? B/sec
lt_eq Float32 1.00 74.3±0.11µs ? B/sec 1.66 123.5±0.28µs ? B/sec
lt_eq scalar Float32 1.00 61.5±0.56µs ? B/sec 1.45 88.9±0.22µs ? B/sec
multiply 512 1.00 262.0±0.38ns ? B/sec 1.32 345.1±4.37ns ? B/sec
neq scalar Float32 1.20 78.6±0.81µs ? B/sec 1.00 65.5±0.65µs ? B/sec
or 1.00 1590.0±2.45ns ? B/sec 1.11 1760.7±14.78ns ? B/sec
subtract 512 1.43 369.9±5.24ns ? B/sec 1.00 258.4±0.40ns ? B/sec
take i32 1024 1.10 1871.4±3.67ns ? B/sec 1.00 1695.3±4.85ns ? B/sec
take i32 512 1.00 928.5±1.79ns ? B/sec 1.12 1043.4±13.97ns ? B/sec
take i32 nulls 512 1.10 1074.7±1.81ns ? B/sec 1.00 975.5±1.60ns ? B/sec
```
So it's hard to see through the noise... but I think it looks performance neutral? (I don't think take/subtract/multiply/concat/cast/add interact with this api at all, so those should just be noise).
The only other kernel that I think might be able to take advantage of this might be 'take'... I believe it does bit-by-bit buffer to copy the 'taken' null values... But I haven't looked at whether it actually would benefit.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-768406002
For me, this is ready to merge, @tyrelr: excellent job here and thanks a lot for this.
Could you rebase it against the latest master?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563336446
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
That approach seems to be worse in the micro-benchmark, but better for the comparison kernel.
```
critcmp master-67d0c2e3 bool-38d4e395 bool-11dd14af -t 10
group bool-11dd14af bool-38d4e395 master-67d0c2e3
----- ------------- ------------- ---------------
Buffer::from_iter bool 1.66 10.2±0.02ms ? B/sec 1.00 6.2±0.01ms ? B/sec
MutableBuffer iter bitset 1.00 56.0±0.11ms ? B/sec 1.12 62.5±0.13ms ? B/sec
MutableBuffer::from_iter bool 1.45 8.9±0.02ms ? B/sec 1.00 6.1±0.01ms ? B/sec
add 512 1.06 225.1±1.05ns ? B/sec 1.00 212.9±0.56ns ? B/sec 1.38 293.0±1.54ns ? B/sec
array_from_vec 256 1.00 629.6±2.56ns ? B/sec 1.03 649.4±1.78ns ? B/sec 1.11 701.7±1.29ns ? B/sec
cast int64 to int32 512 1.00 2.2±0.01µs ? B/sec 1.14 2.5±0.01µs ? B/sec 1.02 2.3±0.01µs ? B/sec
cast time32s to time32ms 512 1.23 901.3±2.65ns ? B/sec 1.01 736.1±1.17ns ? B/sec 1.00 731.5±1.15ns ? B/sec
cast timestamp_ms to timestamp_ns 512 1.17 1211.8±2.48ns ? B/sec 1.00 1036.8±1.71ns ? B/sec 1.11 1149.7±2.41ns ? B/sec
eq Float32 1.00 84.6±0.08µs ? B/sec 1.05 89.2±0.28µs ? B/sec 1.49 125.9±0.23µs ? B/sec
eq scalar Float32 1.02 71.2±0.08µs ? B/sec 1.12 78.2±0.18µs ? B/sec 1.00 69.8±0.07µs ? B/sec
from_slice 1.00 488.4±0.92µs ? B/sec 1.81 883.0±1.48µs ? B/sec 1.76 861.2±1.55µs ? B/sec
from_slice prepared 1.00 480.7±1.00µs ? B/sec 1.15 552.1±1.44µs ? B/sec 1.18 564.8±0.85µs ? B/sec
gt scalar Float32 1.24 64.8±0.06µs ? B/sec 1.37 71.5±0.07µs ? B/sec 1.00 52.4±0.06µs ? B/sec
gt_eq Float32 1.00 72.3±0.26µs ? B/sec 1.04 75.3±0.09µs ? B/sec 1.72 124.2±0.09µs ? B/sec
lt scalar Float32 1.07 65.5±0.09µs ? B/sec 1.15 70.9±0.13µs ? B/sec 1.00 61.4±0.09µs ? B/sec
lt_eq Float32 1.00 72.0±0.07µs ? B/sec 1.03 74.2±0.08µs ? B/sec 1.71 123.1±0.23µs ? B/sec
lt_eq scalar Float32 1.00 58.0±0.10µs ? B/sec 1.05 61.1±0.07µs ? B/sec 1.52 88.3±0.15µs ? B/sec
min nulls 512 1.11 2.1±0.01µs ? B/sec 1.03 1961.9±13.05ns ? B/sec 1.00 1908.7±19.95ns ? B/sec
multiply 512 1.34 350.1±0.66ns ? B/sec 1.30 339.1±0.32ns ? B/sec 1.00 261.6±0.30ns ? B/sec
mutable 3.79 1630.3±2.29µs ? B/sec 1.00 429.8±1.01µs ? B/sec 1.02 436.7±1.19µs ? B/sec
mutable extend 2.45 2.0±0.00ms ? B/sec 1.00 835.7±3.31µs ? B/sec 1.01 846.4±1.40µs ? B/sec
mutable iter extend_from_slice 2.20 2.2±0.00ms ? B/sec 1.00 1007.3±1.72µs ? B/sec
mutable str nulls 1024 1.16 5.4±0.01ms ? B/sec 1.00 4.7±0.01ms ? B/sec 1.05 4.9±0.01ms ? B/sec
neq scalar Float32 1.12 71.3±0.12µs ? B/sec 1.23 78.2±0.08µs ? B/sec 1.00 63.6±0.18µs ? B/sec
not 1.13 1145.6±1.91ns ? B/sec 1.13 1146.5±1.20ns ? B/sec 1.00 1014.1±0.90ns ? B/sec
subtract 512 1.02 258.8±1.15ns ? B/sec 1.40 354.9±0.66ns ? B/sec
```
I may try playing with the baselines to see if I'm missing a black-box that's causing an unrealistic optimization...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-766160749
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=h1) Report
> Merging [#9304](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=desc) (38d4e39) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.00%`.
> The diff coverage is `91.37%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9304/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9304 +/- ##
=======================================
Coverage 81.84% 81.85%
=======================================
Files 215 215
Lines 52949 53007 +58
=======================================
+ Hits 43336 43388 +52
- Misses 9613 9619 +6
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9304/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.40% <91.07%> (-0.28%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/9304/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.00% <100.00%> (+0.01%)` | :arrow_up: |
| [rust/parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow/pull/9304/diff?src=pr&el=tree#diff-cnVzdC9wYXJxdWV0L3NyYy9lbmNvZGluZ3MvZW5jb2RpbmcucnM=) | `95.24% <0.00%> (-0.20%)` | :arrow_down: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=footer). Last update [67d0c2e...38d4e39](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563393453
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
After making that change, it seems like this could be achieved without requiring any changes to the previous Buffer API's: The byte-accumulation loop could be exposed as a ```PackedBoolIterator``` that wraps an ```Iterator<Item=bool>```, exposing it as ``Iterator<Item=u8>()```. I'm not sure if that would a better or worse approach...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563336446
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
That approach seems to be worse in the micro-benchmark, but better for the comparison kernel.
```
critcmp master-67d0c2e3 bool-38d4e395 bool-11dd14af -t 10
group bool-11dd14af bool-38d4e395 master-67d0c2e3
----- ------------- ------------- ---------------
Buffer::from_iter bool 1.66 10.2±0.02ms ? B/sec 1.00 6.2±0.01ms ? B/sec
MutableBuffer iter bitset 1.00 56.0±0.11ms ? B/sec 1.12 62.5±0.13ms ? B/sec
MutableBuffer::from_iter bool 1.45 8.9±0.02ms ? B/sec 1.00 6.1±0.01ms ? B/sec
add 512 1.06 225.1±1.05ns ? B/sec 1.00 212.9±0.56ns ? B/sec 1.38 293.0±1.54ns ? B/sec
array_from_vec 256 1.00 629.6±2.56ns ? B/sec 1.03 649.4±1.78ns ? B/sec 1.11 701.7±1.29ns ? B/sec
cast int64 to int32 512 1.00 2.2±0.01µs ? B/sec 1.14 2.5±0.01µs ? B/sec 1.02 2.3±0.01µs ? B/sec
cast time32s to time32ms 512 1.23 901.3±2.65ns ? B/sec 1.01 736.1±1.17ns ? B/sec 1.00 731.5±1.15ns ? B/sec
cast timestamp_ms to timestamp_ns 512 1.17 1211.8±2.48ns ? B/sec 1.00 1036.8±1.71ns ? B/sec 1.11 1149.7±2.41ns ? B/sec
eq Float32 1.00 84.6±0.08µs ? B/sec 1.05 89.2±0.28µs ? B/sec 1.49 125.9±0.23µs ? B/sec
eq scalar Float32 1.02 71.2±0.08µs ? B/sec 1.12 78.2±0.18µs ? B/sec 1.00 69.8±0.07µs ? B/sec
from_slice 1.00 488.4±0.92µs ? B/sec 1.81 883.0±1.48µs ? B/sec 1.76 861.2±1.55µs ? B/sec
from_slice prepared 1.00 480.7±1.00µs ? B/sec 1.15 552.1±1.44µs ? B/sec 1.18 564.8±0.85µs ? B/sec
gt scalar Float32 1.24 64.8±0.06µs ? B/sec 1.37 71.5±0.07µs ? B/sec 1.00 52.4±0.06µs ? B/sec
gt_eq Float32 1.00 72.3±0.26µs ? B/sec 1.04 75.3±0.09µs ? B/sec 1.72 124.2±0.09µs ? B/sec
lt scalar Float32 1.07 65.5±0.09µs ? B/sec 1.15 70.9±0.13µs ? B/sec 1.00 61.4±0.09µs ? B/sec
lt_eq Float32 1.00 72.0±0.07µs ? B/sec 1.03 74.2±0.08µs ? B/sec 1.71 123.1±0.23µs ? B/sec
lt_eq scalar Float32 1.00 58.0±0.10µs ? B/sec 1.05 61.1±0.07µs ? B/sec 1.52 88.3±0.15µs ? B/sec
min nulls 512 1.11 2.1±0.01µs ? B/sec 1.03 1961.9±13.05ns ? B/sec 1.00 1908.7±19.95ns ? B/sec
multiply 512 1.34 350.1±0.66ns ? B/sec 1.30 339.1±0.32ns ? B/sec 1.00 261.6±0.30ns ? B/sec
mutable 3.79 1630.3±2.29µs ? B/sec 1.00 429.8±1.01µs ? B/sec 1.02 436.7±1.19µs ? B/sec
mutable extend 2.45 2.0±0.00ms ? B/sec 1.00 835.7±3.31µs ? B/sec 1.01 846.4±1.40µs ? B/sec
mutable iter extend_from_slice 2.20 2.2±0.00ms ? B/sec 1.00 1007.3±1.72µs ? B/sec
mutable str nulls 1024 1.16 5.4±0.01ms ? B/sec 1.00 4.7±0.01ms ? B/sec 1.05 4.9±0.01ms ? B/sec
neq scalar Float32 1.12 71.3±0.12µs ? B/sec 1.23 78.2±0.08µs ? B/sec 1.00 63.6±0.18µs ? B/sec
not 1.13 1145.6±1.91ns ? B/sec 1.13 1146.5±1.20ns ? B/sec 1.00 1014.1±0.90ns ? B/sec
subtract 512 1.02 258.8±1.15ns ? B/sec 1.40 354.9±0.66ns ? B/sec
```
I may try playing with the baselines to see if I'm missing a black-box that's causing an unrealistic optimization...
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
That approach seems to be worse in the micro-benchmark, but better for the comparison kernel.
```
critcmp master-67d0c2e3 bool-38d4e395 bool-11dd14af -t 10
group bool-11dd14af bool-38d4e395 master-67d0c2e3
----- ------------- ------------- ---------------
Buffer::from_iter bool 1.66 10.2±0.02ms ? B/sec 1.00 6.2±0.01ms ? B/sec
MutableBuffer iter bitset 1.00 56.0±0.11ms ? B/sec 1.12 62.5±0.13ms ? B/sec
MutableBuffer::from_iter bool 1.45 8.9±0.02ms ? B/sec 1.00 6.1±0.01ms ? B/sec
add 512 1.06 225.1±1.05ns ? B/sec 1.00 212.9±0.56ns ? B/sec 1.38 293.0±1.54ns ? B/sec
array_from_vec 256 1.00 629.6±2.56ns ? B/sec 1.03 649.4±1.78ns ? B/sec 1.11 701.7±1.29ns ? B/sec
cast int64 to int32 512 1.00 2.2±0.01µs ? B/sec 1.14 2.5±0.01µs ? B/sec 1.02 2.3±0.01µs ? B/sec
cast time32s to time32ms 512 1.23 901.3±2.65ns ? B/sec 1.01 736.1±1.17ns ? B/sec 1.00 731.5±1.15ns ? B/sec
cast timestamp_ms to timestamp_ns 512 1.17 1211.8±2.48ns ? B/sec 1.00 1036.8±1.71ns ? B/sec 1.11 1149.7±2.41ns ? B/sec
eq Float32 1.00 84.6±0.08µs ? B/sec 1.05 89.2±0.28µs ? B/sec 1.49 125.9±0.23µs ? B/sec
eq scalar Float32 1.02 71.2±0.08µs ? B/sec 1.12 78.2±0.18µs ? B/sec 1.00 69.8±0.07µs ? B/sec
from_slice 1.00 488.4±0.92µs ? B/sec 1.81 883.0±1.48µs ? B/sec 1.76 861.2±1.55µs ? B/sec
from_slice prepared 1.00 480.7±1.00µs ? B/sec 1.15 552.1±1.44µs ? B/sec 1.18 564.8±0.85µs ? B/sec
gt scalar Float32 1.24 64.8±0.06µs ? B/sec 1.37 71.5±0.07µs ? B/sec 1.00 52.4±0.06µs ? B/sec
gt_eq Float32 1.00 72.3±0.26µs ? B/sec 1.04 75.3±0.09µs ? B/sec 1.72 124.2±0.09µs ? B/sec
lt scalar Float32 1.07 65.5±0.09µs ? B/sec 1.15 70.9±0.13µs ? B/sec 1.00 61.4±0.09µs ? B/sec
lt_eq Float32 1.00 72.0±0.07µs ? B/sec 1.03 74.2±0.08µs ? B/sec 1.71 123.1±0.23µs ? B/sec
lt_eq scalar Float32 1.00 58.0±0.10µs ? B/sec 1.05 61.1±0.07µs ? B/sec 1.52 88.3±0.15µs ? B/sec
min nulls 512 1.11 2.1±0.01µs ? B/sec 1.03 1961.9±13.05ns ? B/sec 1.00 1908.7±19.95ns ? B/sec
multiply 512 1.34 350.1±0.66ns ? B/sec 1.30 339.1±0.32ns ? B/sec 1.00 261.6±0.30ns ? B/sec
mutable 3.79 1630.3±2.29µs ? B/sec 1.00 429.8±1.01µs ? B/sec 1.02 436.7±1.19µs ? B/sec
mutable extend 2.45 2.0±0.00ms ? B/sec 1.00 835.7±3.31µs ? B/sec 1.01 846.4±1.40µs ? B/sec
mutable iter extend_from_slice 2.20 2.2±0.00ms ? B/sec 1.00 1007.3±1.72µs ? B/sec
mutable str nulls 1024 1.16 5.4±0.01ms ? B/sec 1.00 4.7±0.01ms ? B/sec 1.05 4.9±0.01ms ? B/sec
neq scalar Float32 1.12 71.3±0.12µs ? B/sec 1.23 78.2±0.08µs ? B/sec 1.00 63.6±0.18µs ? B/sec
not 1.13 1145.6±1.91ns ? B/sec 1.13 1146.5±1.20ns ? B/sec 1.00 1014.1±0.90ns ? B/sec
subtract 512 1.02 258.8±1.15ns ? B/sec 1.40 354.9±0.66ns ? B/sec
```
I may try playing with the benchmarks to see if I'm missing a black-box that's causing an unrealistic optimization...
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
After making that change, it seems like this could be achieved without requiring any changes to the previous Buffer API's: The byte-accumulation loop could be exposed as a PackedBoolIterator that wraps an Iter<bool>, exposing it as Iter<u8>(). I'm not sure if that would a better or worse approach...
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
After making that change, it seems like this could be achieved without requiring any changes to the previous Buffer API's: The byte-accumulation loop could be exposed as a ```PackedBoolIterator``` that wraps an ```Iterator<Item=bool>```, exposing it as ``Iterator<Item=u8>()```. I'm not sure if that would a better or worse approach...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-766172368
Hi. This looks good. I tried this some time ago but was unable to make it efficient. Do you have the benchmarks for the comparison kernels?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563185541
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1260,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
Review comment:
I prefer this implementation to the 'Buffer' implementation, since it has less pointer juggling.
Originally, it was hair slower... but with the MutableBuffer improvements, it appears to be a bit faster... so that's good.
```
group bool-ecd919bde
----- --------------
Buffer::from_iter bool 1.00 7.2±0.06ms ? B/sec
MutableBuffer iter bitset 1.00 62.1±0.05ms ? B/sec
MutableBuffer::from_iter bool 1.00 6.3±0.01ms ? B/sec
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-766158211
https://issues.apache.org/jira/browse/ARROW-11361
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] codecov-io edited a comment on pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
codecov-io edited a comment on pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#issuecomment-766160749
# [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=h1) Report
> Merging [#9304](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=desc) (466c41b) into [master](https://codecov.io/gh/apache/arrow/commit/67d0c2e38011cd883059e3a9fd0ea08088661707?el=desc) (67d0c2e) will **increase** coverage by `0.01%`.
> The diff coverage is `91.37%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow/pull/9304/graphs/tree.svg?width=650&height=150&src=pr&token=LpTCFbqVT1)](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=tree)
```diff
@@ Coverage Diff @@
## master #9304 +/- ##
==========================================
+ Coverage 81.84% 81.85% +0.01%
==========================================
Files 215 215
Lines 52949 53007 +58
==========================================
+ Hits 43336 43389 +53
- Misses 9613 9618 +5
```
| [Impacted Files](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=tree) | Coverage Δ | |
|---|---|---|
| [rust/arrow/src/buffer.rs](https://codecov.io/gh/apache/arrow/pull/9304/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvYnVmZmVyLnJz) | `93.40% <91.07%> (-0.28%)` | :arrow_down: |
| [rust/arrow/src/compute/kernels/comparison.rs](https://codecov.io/gh/apache/arrow/pull/9304/diff?src=pr&el=tree#diff-cnVzdC9hcnJvdy9zcmMvY29tcHV0ZS9rZXJuZWxzL2NvbXBhcmlzb24ucnM=) | `96.00% <100.00%> (+0.01%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=continue).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=footer). Last update [67d0c2e...466c41b](https://codecov.io/gh/apache/arrow/pull/9304?src=pr&el=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563185759
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -241,6 +241,67 @@ impl<T: AsRef<[u8]>> From<T> for Buffer {
}
}
+/// Creating a `Buffer` instance by storing the boolean values into the buffer
+impl std::iter::FromIterator<bool> for Buffer {
+ fn from_iter<I>(iter: I) -> Self
Review comment:
TODO: forward to the MutableBuffer implementation, since it doesn't look like there will be a performance penalty.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] jorgecarleitao closed pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #9304:
URL: https://github.com/apache/arrow/pull/9304
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] tyrelr commented on a change in pull request #9304: ARROW-11361: [Rust] Build MutableBuffer/Buffer from iterator of bools
Posted by GitBox <gi...@apache.org>.
tyrelr commented on a change in pull request #9304:
URL: https://github.com/apache/arrow/pull/9304#discussion_r563336446
##########
File path: rust/arrow/src/buffer.rs
##########
@@ -1188,6 +1209,57 @@ impl Drop for SetLenOnDrop<'_> {
}
}
+/// Creating a `MutableBuffer` instance by setting bits according to the boolean values
+impl std::iter::FromIterator<bool> for MutableBuffer {
+ fn from_iter<I>(iter: I) -> Self
+ where
+ I: IntoIterator<Item = bool>,
+ {
+ let mut iterator = iter.into_iter();
+ let mut result = {
+ let byte_capacity: usize = iterator.size_hint().0.saturating_add(7) / 8;
+ MutableBuffer::new(byte_capacity)
+ };
+
+ while let Some(value) = iterator.next() {
+ //we have the first bit of the byte
+ let mut exhausted = false;
+ let mut byte_accum: u8 = match value {
Review comment:
That approach seems to be worse in the micro-benchmark, but better for the comparison kernel.
```
critcmp master-67d0c2e3 bool-38d4e395 bool-11dd14af -t 10
group bool-11dd14af bool-38d4e395 master-67d0c2e3
----- ------------- ------------- ---------------
Buffer::from_iter bool 1.66 10.2±0.02ms ? B/sec 1.00 6.2±0.01ms ? B/sec
MutableBuffer iter bitset 1.00 56.0±0.11ms ? B/sec 1.12 62.5±0.13ms ? B/sec
MutableBuffer::from_iter bool 1.45 8.9±0.02ms ? B/sec 1.00 6.1±0.01ms ? B/sec
add 512 1.06 225.1±1.05ns ? B/sec 1.00 212.9±0.56ns ? B/sec 1.38 293.0±1.54ns ? B/sec
array_from_vec 256 1.00 629.6±2.56ns ? B/sec 1.03 649.4±1.78ns ? B/sec 1.11 701.7±1.29ns ? B/sec
cast int64 to int32 512 1.00 2.2±0.01µs ? B/sec 1.14 2.5±0.01µs ? B/sec 1.02 2.3±0.01µs ? B/sec
cast time32s to time32ms 512 1.23 901.3±2.65ns ? B/sec 1.01 736.1±1.17ns ? B/sec 1.00 731.5±1.15ns ? B/sec
cast timestamp_ms to timestamp_ns 512 1.17 1211.8±2.48ns ? B/sec 1.00 1036.8±1.71ns ? B/sec 1.11 1149.7±2.41ns ? B/sec
eq Float32 1.00 84.6±0.08µs ? B/sec 1.05 89.2±0.28µs ? B/sec 1.49 125.9±0.23µs ? B/sec
eq scalar Float32 1.02 71.2±0.08µs ? B/sec 1.12 78.2±0.18µs ? B/sec 1.00 69.8±0.07µs ? B/sec
from_slice 1.00 488.4±0.92µs ? B/sec 1.81 883.0±1.48µs ? B/sec 1.76 861.2±1.55µs ? B/sec
from_slice prepared 1.00 480.7±1.00µs ? B/sec 1.15 552.1±1.44µs ? B/sec 1.18 564.8±0.85µs ? B/sec
gt scalar Float32 1.24 64.8±0.06µs ? B/sec 1.37 71.5±0.07µs ? B/sec 1.00 52.4±0.06µs ? B/sec
gt_eq Float32 1.00 72.3±0.26µs ? B/sec 1.04 75.3±0.09µs ? B/sec 1.72 124.2±0.09µs ? B/sec
lt scalar Float32 1.07 65.5±0.09µs ? B/sec 1.15 70.9±0.13µs ? B/sec 1.00 61.4±0.09µs ? B/sec
lt_eq Float32 1.00 72.0±0.07µs ? B/sec 1.03 74.2±0.08µs ? B/sec 1.71 123.1±0.23µs ? B/sec
lt_eq scalar Float32 1.00 58.0±0.10µs ? B/sec 1.05 61.1±0.07µs ? B/sec 1.52 88.3±0.15µs ? B/sec
min nulls 512 1.11 2.1±0.01µs ? B/sec 1.03 1961.9±13.05ns ? B/sec 1.00 1908.7±19.95ns ? B/sec
multiply 512 1.34 350.1±0.66ns ? B/sec 1.30 339.1±0.32ns ? B/sec 1.00 261.6±0.30ns ? B/sec
mutable 3.79 1630.3±2.29µs ? B/sec 1.00 429.8±1.01µs ? B/sec 1.02 436.7±1.19µs ? B/sec
mutable extend 2.45 2.0±0.00ms ? B/sec 1.00 835.7±3.31µs ? B/sec 1.01 846.4±1.40µs ? B/sec
mutable iter extend_from_slice 2.20 2.2±0.00ms ? B/sec 1.00 1007.3±1.72µs ? B/sec
mutable str nulls 1024 1.16 5.4±0.01ms ? B/sec 1.00 4.7±0.01ms ? B/sec 1.05 4.9±0.01ms ? B/sec
neq scalar Float32 1.12 71.3±0.12µs ? B/sec 1.23 78.2±0.08µs ? B/sec 1.00 63.6±0.18µs ? B/sec
not 1.13 1145.6±1.91ns ? B/sec 1.13 1146.5±1.20ns ? B/sec 1.00 1014.1±0.90ns ? B/sec
subtract 512 1.02 258.8±1.15ns ? B/sec 1.40 354.9±0.66ns ? B/sec
```
I may try playing with the benchmarks to see if I'm missing a black-box that's causing an unrealistic optimization...
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org