You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "joshg-ec (via GitHub)" <gi...@apache.org> on 2023/05/31 15:20:55 UTC
[GitHub] [arrow-rs] joshg-ec opened a new issue, #4324: concat_batches fails with total_len <= bit_len assertion for records with lists
joshg-ec opened a new issue, #4324:
URL: https://github.com/apache/arrow-rs/issues/4324
**Describe the bug**
`concat`, used by `concat_batches`, does not appear to allocate sufficient `capacities` when constructing the `MutableArrayData`. Concatenating records that contain lists of structs results in the following panic:
```
assertion failed: total_len <= bit_len
thread 'concat_test' panicked at 'assertion failed: total_len <= bit_len', /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
stack backtrace:
0: rust_begin_unwind
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
1: core::panicking::panic_fmt
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
2: core::panicking::panic
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:114:5
3: arrow_buffer::buffer::boolean::BooleanBuffer::new
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
4: arrow_data::transform::_MutableArrayData::freeze::{{closure}}
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:81:25
5: core::bool::<impl bool>::then
at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/bool.rs:71:24
6: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:80:21
7: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
8: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
9: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
10: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
11: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
12: arrow_data::transform::_MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
13: arrow_data::transform::MutableArrayData::freeze
at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
```
**To Reproduce**
Call `concat_batches` with `RecordBatch`s that contain lists of structs (on the order of 20–50 structs in the list per `RecordBatch`). If I modify [the capacity calculation in concat](https://github.com/apache/arrow-rs/blob/c295b172b37902d5fa41ef275ff5b86caf9fde75/arrow-select/src/concat.rs#L76-L82) to add a constant factor for lists, the error does not occur:
```rust
let capacity = match d {
DataType::Utf8 => binary_capacity::<Utf8Type>(arrays),
DataType::LargeUtf8 => binary_capacity::<LargeUtf8Type>(arrays),
DataType::Binary => binary_capacity::<BinaryType>(arrays),
DataType::LargeBinary => binary_capacity::<LargeBinaryType>(arrays),
DataType::List(_) => {
Capacities::Array(arrays.iter().map(|a| a.len()).sum::<usize>() + 500) // <- 500 added here
}
_ => Capacities::Array(arrays.iter().map(|a| a.len()).sum()),
};
```
**Expected behavior**
No panics when concatenating lists.
**Additional context**
Reproduced with Arrow versions 37--40. Error does not occur in version 34.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists
Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1570500854
Possibly related to https://github.com/apache/arrow-rs/issues/1230 would suggest that the validity buffer is not the correct length. I'll take a look in a bit, MutableArrayData needs some TLC in this regard (#1225)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold closed issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists
Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists
URL: https://github.com/apache/arrow-rs/issues/4324
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists
Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1572044713
Are you able to share a reproducer for this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists
Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1573270702
Ok it looks like this is https://github.com/apache/arrow-rs/issues/1230
As a happy accident https://github.com/apache/arrow-rs/pull/4333 fixed your reproducer as it removed the use of extend_nulls when appending structs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] joshg-ec commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists
Posted by "joshg-ec (via GitHub)" <gi...@apache.org>.
joshg-ec commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1572822818
Sure, see https://github.com/ElementalCognition/arrow-bug.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org