You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/27 08:05:32 UTC

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3198: Use SlicesIterator for ArrayData Equality

tustvold commented on code in PR #3198:
URL: https://github.com/apache/arrow-rs/pull/3198#discussion_r1032884837


##########
arrow-data/src/equal/primitive.rs:
##########
@@ -15,13 +15,17 @@
 // specific language governing permissions and limitations
 // under the License.
 
+use crate::contains_nulls;
+use arrow_buffer::bit_util::get_bit;
 use std::mem::size_of;
 
-use crate::data::{contains_nulls, ArrayData};
-use arrow_buffer::bit_util::get_bit;
+use crate::data::ArrayData;
+use crate::slices_iterator::SlicesIterator;
 
 use super::utils::equal_len;
 
+pub(crate) const NULL_SLICES_SELECTIVITY_THRESHOLD: f64 = 0.4;

Review Comment:
   How did you come up with 0.4, not saying it is a bad choice, just curious



##########
arrow-data/src/slices_iterator.rs:
##########
@@ -0,0 +1,40 @@
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::bit_iterator::BitSliceIterator;
+use arrow_buffer::Buffer;
+
+/// An iterator of `(usize, usize)` each representing an interval
+/// `[start, end)` whose slots of a bitmap [Buffer] are true. Each
+/// interval corresponds to a contiguous region of memory to be
+/// "taken" from an array to be filtered.
+///
+/// ## Notes:
+///
+/// Only performant for filters that copy across long contiguous runs
+#[derive(Debug)]
+pub struct SlicesIterator<'a>(BitSliceIterator<'a>);

Review Comment:
   Moving this is a breaking change, as it is exposed publicly - https://docs.rs/arrow-select/latest/arrow_select/filter/struct.SlicesIterator.html



##########
arrow-data/src/slices_iterator.rs:
##########
@@ -0,0 +1,40 @@
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::bit_iterator::BitSliceIterator;
+use arrow_buffer::Buffer;
+
+/// An iterator of `(usize, usize)` each representing an interval
+/// `[start, end)` whose slots of a bitmap [Buffer] are true. Each
+/// interval corresponds to a contiguous region of memory to be
+/// "taken" from an array to be filtered.
+///
+/// ## Notes:
+///
+/// Only performant for filters that copy across long contiguous runs
+#[derive(Debug)]
+pub struct SlicesIterator<'a>(BitSliceIterator<'a>);
+
+impl<'a> SlicesIterator<'a> {
+    pub fn new_from_buffer(values: &'a Buffer, offset: usize, len: usize) -> Self {
+        Self(BitSliceIterator::new(values, offset, len))

Review Comment:
   At this point why not just use `BitSliceIterator`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org