You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/30 06:48:57 UTC
[GitHub] [arrow-rs] ritchie46 opened a new pull request #382: make sure that only concat preallocates buffers
ritchie46 opened a new pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382
# Which issue does this PR close?
Partial fix for #347. In #348 @jorgecarleitao pointed out that the memory savings don't work for the `filter` and the `zip` kernel.
This PR restores the implementation for those two kernels, and keeps the new `preallocation` for the `concat` kernel.
This is achieved by create a builder pattern for `MutableArrayData`. This way we don't break the API, and we may choose to `preallocate` buffers.
Could you take a look at this @jorgecarleitao ?
# Are there any user-facing changes?
There is an builder pattern struct for `MutableArrayData`.
# If there are any breaking changes to public APIs, please add the `breaking change` label.
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (9e9bef5) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **increase** coverage by `0.00%`.
> The diff coverage is `96.87%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #382 +/- ##
=======================================
Coverage 82.61% 82.61%
=======================================
Files 162 162
Lines 44228 44247 +19
=======================================
+ Hits 36538 36556 +18
- Misses 7690 7691 +1
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `89.11% <94.11%> (-0.07%)` | :arrow_down: |
| [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
| [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `94.85% <0.00%> (-0.20%)` | :arrow_down: |
| [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...9e9bef5](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644175597
##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
} else {
buffer.push(0i32)
}
- let str_values_size = arrays
- .iter()
- .map(|data| {
- // get the length of the value buffer
- let buf_len = data.buffers()[1].len();
- // find the offset of the buffer
- // this returns a slice of offsets, starting from the offset of the array
- // so we can take the first value
- let offset = data.buffer::<Offset>(0)[0];
- buf_len - offset.to_usize().unwrap()
- })
- .sum::<usize>();
[
buffer,
- MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+ MutableBuffer::new(binary_size * mem::size_of::<u8>()),
]
}
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+ /// Binary, Utf8 and LargeUtf8 data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the binary/ str buffer
+ Binary(usize, Option<usize>),
+ /// List and LargeList data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the child data
+ List(usize, Option<Box<Capacities>>),
+ /// Struct type
+ /// * the capacity of the array
+ /// * the capacities of the fields
+ Struct(usize, Option<Vec<Capacities>>),
+ /// Dictionary type
+ /// * the capacity of the array
Review comment:
What is the difference here between the "capacity of the array" and "the capacity of the keys"?
I have a rather special usage of concat for dictionary arrays, where I can guarantee that the dictionary values of all arrays are the same, or at least that the dictionary of the last array includes all previous dictionaries in the same order. I wonder whether that could be supported here and also fix the current concatenation logic that could lead to duplicated dictionary values. The rough idea would be to have another field here with an
```
enum DictionaryMergeStrategy {
Append(Option<usize>), // current logic, just append dictionary values with an optional capacity
Merge(Option<usize>), // actually merge the dictionaries using a hashmap, to be imlemented in a followup
UseLastValues, // use the values of the last dictionary array, might want to validate that this has a size at least as large as all other dictionaries so all keys are guaranteed to be in range
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2f318d4) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **increase** coverage by `0.00%`.
> The diff coverage is `96.66%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #382 +/- ##
=======================================
Coverage 82.61% 82.62%
=======================================
Files 162 162
Lines 44228 44247 +19
=======================================
+ Hits 36538 36557 +19
Misses 7690 7690
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `89.11% <93.33%> (-0.07%)` | :arrow_down: |
| [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
| [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...2f318d4](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] jhorstmann commented on a change in pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
jhorstmann commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644266958
##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
} else {
buffer.push(0i32)
}
- let str_values_size = arrays
- .iter()
- .map(|data| {
- // get the length of the value buffer
- let buf_len = data.buffers()[1].len();
- // find the offset of the buffer
- // this returns a slice of offsets, starting from the offset of the array
- // so we can take the first value
- let offset = data.buffer::<Offset>(0)[0];
- buf_len - offset.to_usize().unwrap()
- })
- .sum::<usize>();
[
buffer,
- MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+ MutableBuffer::new(binary_size * mem::size_of::<u8>()),
]
}
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+ /// Binary, Utf8 and LargeUtf8 data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the binary/ str buffer
+ Binary(usize, Option<usize>),
+ /// List and LargeList data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the child data
+ List(usize, Option<Box<Capacities>>),
+ /// Struct type
+ /// * the capacity of the array
+ /// * the capacities of the fields
+ Struct(usize, Option<Vec<Capacities>>),
+ /// Dictionary type
+ /// * the capacity of the array
Review comment:
I haven't thought this through fully yet. Maybe the caller should actually specify the merged dictionary values array instead of a strategy. And that, or specifying the correct strategy, requires a special concat kernel anyway, so it's basically the same situation as now. I think you do not need to consider this use case in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] ritchie46 commented on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
ritchie46 commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850967804
@jorgecarleitao I implemented your proposal from #347. As we now need to define an `enum` in the `with_capacities` constructor, I want to be extra certain that I added all possible variants, as adding one later is backwards incompatable. Could you check if I missed something.
For now, only the `(large)-utf8` ones uses this, but if the design is ok, I can add more in later PRs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb merged pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
alamb merged pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-857165876
Included in #411
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] ritchie46 commented on a change in pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
ritchie46 commented on a change in pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#discussion_r644194518
##########
File path: arrow/src/array/transform/mod.rs
##########
@@ -338,33 +338,61 @@ fn preallocate_str_buffer<Offset: StringOffsetSizeTrait>(
} else {
buffer.push(0i32)
}
- let str_values_size = arrays
- .iter()
- .map(|data| {
- // get the length of the value buffer
- let buf_len = data.buffers()[1].len();
- // find the offset of the buffer
- // this returns a slice of offsets, starting from the offset of the array
- // so we can take the first value
- let offset = data.buffer::<Offset>(0)[0];
- buf_len - offset.to_usize().unwrap()
- })
- .sum::<usize>();
[
buffer,
- MutableBuffer::new(str_values_size * mem::size_of::<u8>()),
+ MutableBuffer::new(binary_size * mem::size_of::<u8>()),
]
}
+/// Define capacities of child data or data buffers.
+#[derive(Debug, Clone)]
+pub enum Capacities {
+ /// Binary, Utf8 and LargeUtf8 data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the binary/ str buffer
+ Binary(usize, Option<usize>),
+ /// List and LargeList data types
+ /// Define
+ /// * the capacity of the array offsets
+ /// * the capacity of the child data
+ List(usize, Option<Box<Capacities>>),
+ /// Struct type
+ /// * the capacity of the array
+ /// * the capacities of the fields
+ Struct(usize, Option<Vec<Capacities>>),
+ /// Dictionary type
+ /// * the capacity of the array
Review comment:
> What is the difference here between the "capacity of the array" and "the capacity of the keys"?
Yes, that can be the same.
I also understand the use of defining the merge strategy. Regarding the API, should we make that available in a different method. Similar to `HashMap` having a method:
```rust
HashMap::with_capacity(...)
// and
HashMap::with_capacity_and_hasher(...)
```
We could do
```rust
MutableArrayData::with_capacies(...)
// and
MutableArrayData::with_capacities_and_merge_strategy(...)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter edited a comment on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (531a5f3) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **decrease** coverage by `0.04%`.
> The diff coverage is `59.45%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #382 +/- ##
==========================================
- Coverage 82.61% 82.57% -0.05%
==========================================
Files 162 162
Lines 44228 44280 +52
==========================================
+ Hits 36538 36563 +25
- Misses 7690 7717 +27
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `85.97% <49.15%> (-3.22%)` | :arrow_down: |
| [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
| [parquet/src/encodings/encoding.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nLnJz) | `94.85% <0.00%> (-0.20%)` | :arrow_down: |
| [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...531a5f3](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-857137179
@jorgecarleitao what is the status of this one?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] codecov-commenter commented on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850953216
# [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#382](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2e47165) into [master](https://codecov.io/gh/apache/arrow-rs/commit/4ff2e56dc1d3ef4ab89e1a6e75014d77246f6b11?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (4ff2e56) will **decrease** coverage by `0.00%`.
> The diff coverage is `83.92%`.
[![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/382/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
```diff
@@ Coverage Diff @@
## master #382 +/- ##
==========================================
- Coverage 82.61% 82.60% -0.01%
==========================================
Files 162 162
Lines 44228 44253 +25
==========================================
+ Hits 36538 36556 +18
- Misses 7690 7697 +7
```
| [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [arrow/src/array/transform/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9tb2QucnM=) | `88.42% <83.63%> (-0.76%)` | :arrow_down: |
| [arrow/src/compute/kernels/concat.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jb25jYXQucnM=) | `100.00% <100.00%> (ø)` | |
| [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/382/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
------
[Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
> **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
> `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
> Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [4ff2e56...2e47165](https://codecov.io/gh/apache/arrow-rs/pull/382?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb edited a comment on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
alamb edited a comment on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-857165876
Included in #411 cherry pick
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] jorgecarleitao commented on pull request #382: make sure that only concat preallocates buffers
Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #382:
URL: https://github.com/apache/arrow-rs/pull/382#issuecomment-850961132
Thanks @ritchie46 . My small concerns with this PR:
1. We are introducing yet another builder API for something relatively easy to accomplish with an extra method
2. `preallocate_buffers` assumes that "preallocate" means allocate the maximum possible capacity, which IMO is not really pre-allocating, is just being very conservative and allocating the maximum.
3. The API is rather limited to the `concatenate` case.
I left some ideas as a comment on the issue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org