You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/01/26 19:00:55 UTC

[GitHub] [arrow] jhorstmann opened a new pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

jhorstmann opened a new pull request #9330:
URL: https://github.com/apache/arrow/pull/9330


   Another experiement for speeding up `ArrayData`. Most array types only contain a single buffer (except for the validity buffer) and also at most one child data object. I think the only types this does not apply to are struct and union arrays. Using a `SmallVec` trades one comparison against an allocation and the resulting indirection in those cases.
   
   The `ArrayData::new` method is left in place unchanged for compatibility.
   
   TODO:
   
   - [ ] Use the optimized `new_smallvec` method in kernels whenever applicable
   - [ ] Run all benchmarks on a non-throttling machine
   
   @jorgecarleitao @Dandandan this is also related to the discussion on #9271. I tried this initially some month ago and the benchmarks were not conclusive, but might be worth trying again or in combination with removing the Arc indirection.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9330:
URL: https://github.com/apache/arrow/pull/9330#issuecomment-767765332


   <!--
     Licensed to the Apache Software Foundation (ASF) under one
     or more contributor license agreements.  See the NOTICE file
     distributed with this work for additional information
     regarding copyright ownership.  The ASF licenses this file
     to you under the Apache License, Version 2.0 (the
     "License"); you may not use this file except in compliance
     with the License.  You may obtain a copy of the License at
   
       http://www.apache.org/licenses/LICENSE-2.0
   
     Unless required by applicable law or agreed to in writing,
     software distributed under the License is distributed on an
     "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
     KIND, either express or implied.  See the License for the
     specific language governing permissions and limitations
     under the License.
   -->
   
   Thanks for opening a pull request!
   
   Could you open an issue for this pull request on JIRA?
   https://issues.apache.org/jira/browse/ARROW
   
   Then could you also rename pull request title in the following format?
   
       ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}
   
   See also:
   
     * [Other pull requests](https://github.com/apache/arrow/pulls/)
     * [Contribution Guidelines - How to contribute patches](https://arrow.apache.org/docs/developers/contributing.html#how-to-contribute-patches)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #9330:
URL: https://github.com/apache/arrow/pull/9330#discussion_r568465456



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -225,11 +226,11 @@ pub struct ArrayData {
     /// The buffers for this array data. Note that depending on the array types, this
     /// could hold different kinds of buffers (e.g., value buffer, value offset buffer)
     /// at different positions.
-    buffers: Vec<Buffer>,
+    buffers: SmallVec<[Buffer; 1]>,

Review comment:
       Good idea! I haven't checked the memory layout of both, but in the common case of 1 buffer this could inline better (in the absence of LTO). Since we already have the validity bitmap in a separate Option, the variants are `Zero`, `One` and `Many`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb closed pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #9330:
URL: https://github.com/apache/arrow/pull/9330


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #9330:
URL: https://github.com/apache/arrow/pull/9330#issuecomment-789647553


   @jhorstmann  I am closing this PR for the time being to clean up the Rust/Arrow PR backlog.  Please let me know if this is a mistake


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on a change in pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on a change in pull request #9330:
URL: https://github.com/apache/arrow/pull/9330#discussion_r566793765



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -225,11 +226,11 @@ pub struct ArrayData {
     /// The buffers for this array data. Note that depending on the array types, this
     /// could hold different kinds of buffers (e.g., value buffer, value offset buffer)
     /// at different positions.
-    buffers: Vec<Buffer>,
+    buffers: SmallVec<[Buffer; 1]>,

Review comment:
       fyi, max is 3 including the null buffer, so max is 2 in this context. See `MutableDataArray`, where we use two buffers on the stack to avoid the overhead. When unused, a buffer takes something like a `usize` + a pointer.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #9330:
URL: https://github.com/apache/arrow/pull/9330#discussion_r566785994



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -225,11 +226,11 @@ pub struct ArrayData {
     /// The buffers for this array data. Note that depending on the array types, this
     /// could hold different kinds of buffers (e.g., value buffer, value offset buffer)
     /// at different positions.
-    buffers: Vec<Buffer>,
+    buffers: SmallVec<[Buffer; 1]>,

Review comment:
       I wonder if using something like a specific enum might also work (and be more memory efficient) as well:
   
   ```
   pub enum  ArrayBuffer {
     One(ArrayDataRef),
     Two(ArrayDataRef, ArrayDataRef),
     Many(Vec<ArrayDataRef>)
   }
   ```
   
   I honestly don't know what the maximum number of buffers any specific ArrayData can have, 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jhorstmann commented on a change in pull request #9330: [Rust] [Experiment] [WIP]: Use SmallVec in ArrayData to optimize the common usecase of single-buffer arrays

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #9330:
URL: https://github.com/apache/arrow/pull/9330#discussion_r568465456



##########
File path: rust/arrow/src/array/data.rs
##########
@@ -225,11 +226,11 @@ pub struct ArrayData {
     /// The buffers for this array data. Note that depending on the array types, this
     /// could hold different kinds of buffers (e.g., value buffer, value offset buffer)
     /// at different positions.
-    buffers: Vec<Buffer>,
+    buffers: SmallVec<[Buffer; 1]>,

Review comment:
       Good idea! I haven't checked the memory layout of both, but in the common case of 1 buffer this could inline better (in the absence of LTO). Since we already have the validity bitmap in a separate Option, the variants are `Zero`, `One` and `Many`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org