You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "joshg-ec (via GitHub)" <gi...@apache.org> on 2023/05/31 15:20:55 UTC

[GitHub] [arrow-rs] joshg-ec opened a new issue, #4324: concat_batches fails with total_len <= bit_len assertion for records with lists

joshg-ec opened a new issue, #4324:
URL: https://github.com/apache/arrow-rs/issues/4324

   **Describe the bug**
   `concat`, used by `concat_batches`, does not appear to allocate sufficient `capacities` when constructing the `MutableArrayData`. Concatenating records that contain lists of structs results in the following panic:
   ```
   assertion failed: total_len <= bit_len
   thread 'concat_test' panicked at 'assertion failed: total_len <= bit_len', /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
   stack backtrace:
      0: rust_begin_unwind
                at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/std/src/panicking.rs:579:5
      1: core::panicking::panic_fmt
                at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:64:14
      2: core::panicking::panic
                at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/panicking.rs:114:5
      3: arrow_buffer::buffer::boolean::BooleanBuffer::new
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-buffer-40.0.0/src/buffer/boolean.rs:55:9
      4: arrow_data::transform::_MutableArrayData::freeze::{{closure}}
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:81:25
      5: core::bool::<impl bool>::then
                at /rustc/84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc/library/core/src/bool.rs:71:24
      6: arrow_data::transform::_MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:80:21
      7: arrow_data::transform::MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
      8: arrow_data::transform::_MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
      9: arrow_data::transform::MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
     10: arrow_data::transform::_MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
     11: arrow_data::transform::MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
     12: arrow_data::transform::_MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:74:37
     13: arrow_data::transform::MutableArrayData::freeze
                at /Users/x/.cargo/registry/src/index.crates.io-6f17d22bba15001f/arrow-data-40.0.0/src/transform/mod.rs:656:18
   ```
   
   **To Reproduce**
   Call `concat_batches` with `RecordBatch`s that contain lists of structs (on the order of 20–50 structs in the list per `RecordBatch`). If I modify [the capacity calculation in concat](https://github.com/apache/arrow-rs/blob/c295b172b37902d5fa41ef275ff5b86caf9fde75/arrow-select/src/concat.rs#L76-L82) to add a constant factor for lists, the error does not occur:
   ```rust
       let capacity = match d {
           DataType::Utf8 => binary_capacity::<Utf8Type>(arrays),
           DataType::LargeUtf8 => binary_capacity::<LargeUtf8Type>(arrays),
           DataType::Binary => binary_capacity::<BinaryType>(arrays),
           DataType::LargeBinary => binary_capacity::<LargeBinaryType>(arrays),
           DataType::List(_) => {
               Capacities::Array(arrays.iter().map(|a| a.len()).sum::<usize>() + 500) // <- 500 added here
           }
           _ => Capacities::Array(arrays.iter().map(|a| a.len()).sum()),
       };
   ```
   
   **Expected behavior**
   No panics when concatenating lists.
   
   **Additional context**
   Reproduced with Arrow versions 37--40. Error does not occur in version 34.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1570500854

   Possibly related to https://github.com/apache/arrow-rs/issues/1230 would suggest that the validity buffer is not the correct length. I'll take a look in a bit, MutableArrayData needs some TLC in this regard (#1225)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists
URL: https://github.com/apache/arrow-rs/issues/4324


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1572044713

   Are you able to share a reproducer for this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1573270702

   Ok it looks like this is https://github.com/apache/arrow-rs/issues/1230
   
   As a happy accident https://github.com/apache/arrow-rs/pull/4333 fixed your reproducer as it removed the use of extend_nulls when appending structs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] joshg-ec commented on issue #4324: concat_batches panics with total_len <= bit_len assertion for records with lists

Posted by "joshg-ec (via GitHub)" <gi...@apache.org>.
joshg-ec commented on issue #4324:
URL: https://github.com/apache/arrow-rs/issues/4324#issuecomment-1572822818

   Sure, see https://github.com/ElementalCognition/arrow-bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org