You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/31 10:47:13 UTC

[GitHub] [arrow-rs] yordan-pavlov commented on a change in pull request #384: Implement faster arrow array reader

yordan-pavlov commented on a change in pull request #384:
URL: https://github.com/apache/arrow-rs/pull/384#discussion_r642397656



##########
File path: arrow/src/compute/kernels/filter.rs
##########
@@ -59,19 +59,14 @@ pub(crate) struct SlicesIterator<'a> {
 }
 
 impl<'a> SlicesIterator<'a> {
-    pub(crate) fn new(filter: &'a BooleanArray) -> Self {
+    pub fn new(filter: &'a BooleanArray) -> Self {
         let values = &filter.data_ref().buffers()[0];
-
-        // this operation is performed before iteration
-        // because it is fast and allows reserving all the needed memory
-        let filter_count = values.count_set_bits_offset(filter.offset(), filter.len());

Review comment:
       good question @Dandandan , in my opinion calculating `filter_count` should not be done in the `SlicesIterator` because it's not used here. It's just a convenience for many of the clients of the `SlicesIterator`. Also having `filter_count` calculated in `SlicesIterator::new` is inflexible and in the use case of the new `ArrowArrayReader` would have meant that counting would be performed twice unnecessarily. That's why I have moved it to a `filter_count()` method instead - keep this convenience for users of `SlicesIterator`, but make it more flexible and allow more use-cases. Where I have had to change existing code, I was careful to only invoke `filter_count()` a single time. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org