You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/20 21:23:32 UTC
[GitHub] [arrow-rs] tustvold opened a new issue #1072: String Comparison Kernels Non-Monotonic Offset Panic
tustvold opened a new issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072
**Describe the bug**
The string comparison kernels panic if they encounter non-monotonic offsets, even if the values for which they are encountered are null. Currently `ArrayDataBuilder`'s validation catches such arrays, however, I think this is actually a bug (#1071) as I don't think the arrow specification forbids nulls from having arbitrary offsets. I encountered this whilst working on a parquet StringArray decoder where I was hoping to leave the offsets for null values default zero-initialized and save some cycles.
**To Reproduce**
```
// Test handling of nulls with arbitrary offsets
let offsets = vec![2, 0, 2, 2];
let validity = vec![false, true, false];
let values = "ab";
let mut offsets_buffer = MutableBuffer::new(offsets.len() * 4);
offsets_buffer.extend_from_slice(&offsets);
let validity_buffer = MutableBuffer::from_iter(validity.iter().cloned());
let mut values_buffer = MutableBuffer::new(values.len());
values_buffer.extend_from_slice(values.as_bytes());
let builder = ArrayDataBuilder::new(DataType::Utf8)
.len(validity.len())
.add_buffer(offsets_buffer.into())
.add_buffer(values_buffer.into())
.null_bit_buffer(validity_buffer.into());
// Validation prevents non-monotonic offsets (#1071)
let data = unsafe { builder.build_unchecked() };
let string = StringArray::from(data);
let eq = eq_utf8(&string, &string).unwrap();
assert!(min_boolean(&eq).unwrap())
```
**Expected behavior**
I would expect the above to pass, all non-null values have valid offsets. Instead it panics with
```
called `Option::unwrap()` on a `None` value
thread 'compute::kernels::comparison::tests::test_utf8_array_nulls' panicked at 'called `Option::unwrap()` on a `None` value', arrow/src/array/array_string.rs:101:40
stack backtrace:
0: rust_begin_unwind
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
1: core::panicking::panic_fmt
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
2: core::panicking::panic
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
3: core::option::Option<T>::unwrap
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:746:21
4: arrow::array::array_string::GenericStringArray<OffsetSize>::value_unchecked
at ./src/array/array_string.rs:101:13
5: arrow::compute::kernels::comparison::eq_utf8::{{closure}}
at ./src/compute/kernels/comparison.rs:56:35
6: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:280:13
7: core::option::Option<T>::map
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:848:29
8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/iter/adapters/map.rs:103:9
9: arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
at ./src/buffer/mutable.rs:489:38
10: arrow::compute::kernels::comparison::eq_utf8
at ./src/compute/kernels/comparison.rs:685:5
11: arrow::compute::kernels::comparison::tests::test_utf8_array_nulls
at ./src/compute/kernels/comparison.rs:2305:18
12: arrow::compute::kernels::comparison::tests::test_utf8_array_nulls::{{closure}}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold closed issue #1072: String Comparison Kernels Non-Monotonic Offset Panic
Posted by GitBox <gi...@apache.org>.
tustvold closed issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] tustvold commented on issue #1072: String Comparison Kernels Non-Monotonic Offset Panic
Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072#issuecomment-1002587895
The consensus on #1071 is that offsets should be monotonically increasing, so closing this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-rs] alamb commented on issue #1072: String Comparison Kernels Non-Monotonic Offset Panic
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072#issuecomment-998315705
I don't think we necessarily want to support non monotonic offsets -- but the string comparison kernel shouldn't be checking the values of array slots that are not valid (aka are null)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org