You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/23 15:40:34 UTC

[GitHub] [arrow-rs] tustvold opened a new issue #1229: Add MutableArrayData::extend_ranges

tustvold opened a new issue #1229:
URL: https://github.com/apache/arrow-rs/issues/1229


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   `MutableArrayData` is created with one or more `ArrayData` and can be used to copy across rows from the source arrays to a destination array. It does this by constructing the following for each of the arrays. These can then be used to copy a range of values from the source array's null mask and data respectively.
   
   ```
   type ExtendNullBits<'a> = Box<dyn Fn(&mut _MutableArrayData, usize, usize) + 'a>;
   type Extend<'a> = Box<dyn Fn(&mut _MutableArrayData, usize, usize, usize) + 'a>;
   ```
   
   It then also constructs 
   
   ```
   type ExtendNulls = Box<dyn Fn(&mut _MutableArrayData, usize)>;
   ```
   
   Which can be used to append null values to the in-progress array.
   
   Users don't call these boxed functions directly, but instead call `MutableArrayData::extend` or `MutableArrayData::extend_nulls` which in turn call the appropriate functions.
   
   This works really well for kernels such as `concat` which call `MutableArrayData` with large ranges, however, it performs poorly in kernels such as `take` and `filter` where the contiguous ranges may be very small.
   
   **Describe the solution you'd like**
   
   Modify the signatures of these functions to a slice of ranges, and add `MutableArrayData::extend_ranges(&mut self, index: usize, ranges: &[Range<usize>])`
   
   This will not only amortise the cost of the extend functions, but will also allow implementations to do more performant gather operations where possible
   
   **Describe alternatives you've considered**
   
   We may in future want to support passing bitmasks instead of ranges down
   
   **Additional context**
   
   The `Filter` returned by `build_filter` and used when filtering a record batch with more than one column, already computes a Vec of ranges - and so this would be effectively free.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on issue #1229: Add MutableArrayData::extend_ranges

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1229:
URL: https://github.com/apache/arrow-rs/issues/1229#issuecomment-1019516203


   Linking https://github.com/apache/arrow-datafusion/issues/416 and https://github.com/apache/arrow-datafusion/issues/1572 as at least historically `MutableArrayData` was one of the bottlenecks in `SortPreservingMerge`. In particular the code used to reconstruct a record batch https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_plan/sorts/sort_preserving_merge.rs#L452
   
   FYI @yjshen @alamb 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on issue #1229: Add MutableArrayData::extend_ranges

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1229:
URL: https://github.com/apache/arrow-rs/issues/1229#issuecomment-1024512838


   Having thought about this a bit more, filter would likely be better off with specialized impls as it can then elide range checks, etc... I'm going to take a stab at that and see what I can come up with.
   
   I'll leave this ticket open as it may still aid SortPreservingMerge
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org