You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/30 06:48:57 UTC

[GitHub] [arrow-rs] ritchie46 opened a new pull request #382: make sure that only concat preallocates buffers

ritchie46 opened a new pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382


   # Which issue does this PR close?
   
   Partial fix for #347. In #348  @jorgecarleitao pointed out that the memory savings don't work for the `filter` and the `zip` kernel. 
   This PR restores the implementation for those two kernels, and keeps the new `preallocation` for the `concat` kernel. 
   
   This is achieved by create a builder pattern for `MutableArrayData`. This way we don't break the API, and we may choose to `preallocate` buffers.
   
   Could you take a look at this @jorgecarleitao ?
   
   # Are there any user-facing changes?
   
   There is an builder pattern struct for `MutableArrayData`.
   
   # If there are any breaking changes to public APIs, please add the `breaking change` label.
   
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9e9bef5) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **increase** coverage by `0.00%`.
   > The diff coverage is `96.87%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #382   +/-   ##
   =======================================
     Coverage   82.61%   82.61%           
   =======================================
     Files         162      162           
     Lines       44228    44247   +19     
   =======================================
   + Hits        36538    36556   +18     
   - Misses       7690     7691    +1     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `89.11% <94.11%> (-0.07%)` | :arrow_down: |
   | [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `94.85% <0.00%> (-0.20%)` | :arrow_down: |
   | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...9e9bef5](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644175597



##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
     } else {
         buffer.push(0i32)
     }
-    let str_values_size = arrays
-        .iter()
-        .map(|data| {
-            // get the length of the value buffer
-            let buf_len = data.buffers()[1].len();
-            // find the offset of the buffer
-            // this returns a slice of offsets, starting from the offset of the array
-            // so we can take the first value
-            let offset = data.buffer::<Offset>(0)[0];
-            buf_len - offset.to_usize().unwrap()
-        })
-        .sum::<usize>();
 
     [
         buffer,
-        MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+        MutableBuffer::new(binary_size * mem::size_of::<u8>()),
     ]
 }
 
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+    /// Binary, Utf8 and LargeUtf8 data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the binary/ str buffer
+    Binary(usize, Option<usize>),
+    /// List and LargeList data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the child data
+    List(usize, Option<Box<Capacities>>),
+    /// Struct type
+    /// * the capacity of the array
+    /// * the capacities of the fields
+    Struct(usize, Option<Vec<Capacities>>),
+    /// Dictionary type
+    /// * the capacity of the array

Review comment:
       What is the difference here between the "capacity of the array" and "the capacity of the keys"?
   
   I have a rather special usage of concat for dictionary arrays, where I can guarantee that the dictionary values of all arrays are the same, or at least that the dictionary of the last array includes all previous dictionaries in the same order. I wonder whether that could be supported here and also fix the current concatenation logic that could lead to duplicated dictionary values. The rough idea would be to have another field here with an
   ```
   enum DictionaryMergeStrategy {
       Append(Option<usize>), // current logic, just append dictionary values with an optional capacity
       Merge(Option<usize>), // actually merge the dictionaries using a hashmap, to be imlemented in a followup
       UseLastValues, // use the values of the last dictionary array, might want to validate that this has a size at least as large as all other dictionaries so all keys are guaranteed to be in range
   }
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2f318d4) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **increase** coverage by `0.00%`.
   > The diff coverage is `96.66%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@           Coverage Diff           @@
   ##           master     #382   +/-   ##
   =======================================
     Coverage   82.61%   82.62%           
   =======================================
     Files         162      162           
     Lines       44228    44247   +19     
   =======================================
   + Hits        36538    36557   +19     
     Misses       7690     7690           
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `89.11% <93.33%> (-0.07%)` | :arrow_down: |
   | [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...2f318d4](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644266958



##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
     } else {
         buffer.push(0i32)
     }
-    let str_values_size = arrays
-        .iter()
-        .map(|data| {
-            // get the length of the value buffer
-            let buf_len = data.buffers()[1].len();
-            // find the offset of the buffer
-            // this returns a slice of offsets, starting from the offset of the array
-            // so we can take the first value
-            let offset = data.buffer::<Offset>(0)[0];
-            buf_len - offset.to_usize().unwrap()
-        })
-        .sum::<usize>();
 
     [
         buffer,
-        MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+        MutableBuffer::new(binary_size * mem::size_of::<u8>()),
     ]
 }
 
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+    /// Binary, Utf8 and LargeUtf8 data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the binary/ str buffer
+    Binary(usize, Option<usize>),
+    /// List and LargeList data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the child data
+    List(usize, Option<Box<Capacities>>),
+    /// Struct type
+    /// * the capacity of the array
+    /// * the capacities of the fields
+    Struct(usize, Option<Vec<Capacities>>),
+    /// Dictionary type
+    /// * the capacity of the array

Review comment:
       I haven't thought this through fully yet. Maybe the caller should actually specify the merged dictionary values array instead of a strategy. And that, or specifying the correct strategy, requires a special concat kernel anyway, so it's basically the same situation as now. I think you do not need to consider this use case in this PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850967804


   @jorgecarleitao I implemented your proposal from #347. As we now need to define an `enum` in the `with_capacities`  constructor, I want to be extra certain that I added all possible variants, as adding one later is backwards incompatable. Could you check if I missed something.
   
   For now, only the `(large)-utf8` ones uses this, but if the design is ok, I can add more in later PRs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb merged pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
alamb merged pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-857165876


   Included in #411 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] ritchie46 commented on a change in pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644194518



##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
     } else {
         buffer.push(0i32)
     }
-    let str_values_size = arrays
-        .iter()
-        .map(|data| {
-            // get the length of the value buffer
-            let buf_len = data.buffers()[1].len();
-            // find the offset of the buffer
-            // this returns a slice of offsets, starting from the offset of the array
-            // so we can take the first value
-            let offset = data.buffer::<Offset>(0)[0];
-            buf_len - offset.to_usize().unwrap()
-        })
-        .sum::<usize>();
 
     [
         buffer,
-        MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+        MutableBuffer::new(binary_size * mem::size_of::<u8>()),
     ]
 }
 
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+    /// Binary, Utf8 and LargeUtf8 data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the binary/ str buffer
+    Binary(usize, Option<usize>),
+    /// List and LargeList data types
+    /// Define
+    /// * the capacity of the array offsets
+    /// * the capacity of the child data
+    List(usize, Option<Box<Capacities>>),
+    /// Struct type
+    /// * the capacity of the array
+    /// * the capacities of the fields
+    Struct(usize, Option<Vec<Capacities>>),
+    /// Dictionary type
+    /// * the capacity of the array

Review comment:
       > What is the difference here between the "capacity of the array" and "the capacity of the keys"?
   
   Yes, that can be the same.
   
   I also understand the use of defining the merge strategy. Regarding  the API, should we make that available in a different method. Similar to `HashMap` having a method:
   
   ```rust
   HashMap::with_capacity(...)
   // and
   HashMap::with_capacity_and_hasher(...)
   ```
   
   We could do
   
   ```rust
   MutableArrayData::with_capacies(...)
   // and
   MutableArrayData::with_capacities_and_merge_strategy(...)
   ```
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (531a5f3) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **decrease** coverage by `0.04%`.
   > The diff coverage is `59.45%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #382      +/-   ##
   ==========================================
   - Coverage   82.61%   82.57%   -0.05%     
   ==========================================
     Files         162      162              
     Lines       44228    44280      +52     
   ==========================================
   + Hits        36538    36563      +25     
   - Misses       7690     7717      +27     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `85.97% <49.15%> (-3.22%)` | :arrow_down: |
   | [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
   | [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `94.85% <0.00%> (-0.20%)` | :arrow_down: |
   | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...531a5f3](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-857137179


   @jorgecarleitao  what is the status of this one? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2e47165) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **decrease** coverage by `0.00%`.
   > The diff coverage is `83.92%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #382      +/-   ##
   ==========================================
   - Coverage   82.61%   82.60%   -0.01%     
   ==========================================
     Files         162      162              
     Lines       44228    44253      +25     
   ==========================================
   + Hits        36538    36556      +18     
   - Misses       7690     7697       +7     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `88.42% <83.63%> (-0.76%)` | :arrow_down: |
   | [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
   | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...2e47165](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb edited a comment on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-857165876


   Included in #411 cherry pick


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on pull request #382: make sure that only concat preallocates buffers

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850961132


   Thanks @ritchie46 . My small concerns with this PR:
   
   1. We are introducing yet another builder API for something relatively easy to accomplish with an extra method
   2. `preallocate_buffers` assumes that "preallocate" means allocate the maximum possible capacity, which IMO is not really pre-allocating, is just being very conservative and allocating the maximum.
   3. The API is rather limited to the `concatenate` case.
   
   I left some ideas as a comment on the issue
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org