You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/20 21:23:32 UTC

[GitHub] [arrow-rs] tustvold opened a new issue #1072: String Comparison Kernels Non-Monotonic Offset Panic

tustvold opened a new issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072


   **Describe the bug**
   
   The string comparison kernels panic if they encounter non-monotonic offsets, even if the values for which they are encountered are null. Currently `ArrayDataBuilder`'s validation catches such arrays, however, I think this is actually a bug (#1071) as I don't think the arrow specification forbids nulls from having arbitrary offsets. I encountered this whilst working on a parquet StringArray decoder where I was hoping to leave the offsets for null values default zero-initialized and save some cycles.
   
   **To Reproduce**
   
   ```
   // Test handling of nulls with arbitrary offsets
   let offsets = vec![2, 0, 2, 2];
   let validity = vec![false, true, false];
   let values = "ab";
   
   let mut offsets_buffer = MutableBuffer::new(offsets.len() * 4);
   offsets_buffer.extend_from_slice(&offsets);
   
   let validity_buffer = MutableBuffer::from_iter(validity.iter().cloned());
   
   let mut values_buffer = MutableBuffer::new(values.len());
   values_buffer.extend_from_slice(values.as_bytes());
   
   let builder = ArrayDataBuilder::new(DataType::Utf8)
       .len(validity.len())
       .add_buffer(offsets_buffer.into())
       .add_buffer(values_buffer.into())
       .null_bit_buffer(validity_buffer.into());
   
   // Validation prevents non-monotonic offsets (#1071)
   let data = unsafe { builder.build_unchecked() };
   
   let string = StringArray::from(data);
   let eq = eq_utf8(&string, &string).unwrap();
   assert!(min_boolean(&eq).unwrap())
   ```
   
   **Expected behavior**
   
   I would expect the above to pass, all non-null values have valid offsets. Instead it panics with
   
   ```
   called `Option::unwrap()` on a `None` value
   thread 'compute::kernels::comparison::tests::test_utf8_array_nulls' panicked at 'called `Option::unwrap()` on a `None` value', arrow/src/array/array_string.rs:101:40
   stack backtrace:
      0: rust_begin_unwind
                at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/std/src/panicking.rs:517:5
      1: core::panicking::panic_fmt
                at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:100:14
      2: core::panicking::panic
                at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/panicking.rs:50:5
      3: core::option::Option<T>::unwrap
                at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:746:21
      4: arrow::array::array_string::GenericStringArray<OffsetSize>::value_unchecked
                at ./src/array/array_string.rs:101:13
      5: arrow::compute::kernels::comparison::eq_utf8::{{closure}}
                at ./src/compute/kernels/comparison.rs:56:35
      6: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
                at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/ops/function.rs:280:13
      7: core::option::Option<T>::map
                at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/option.rs:848:29
      8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
                at /rustc/f1edd0429582dd29cccacaf50fd134b05593bd9c/library/core/src/iter/adapters/map.rs:103:9
      9: arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter_bool
                at ./src/buffer/mutable.rs:489:38
     10: arrow::compute::kernels::comparison::eq_utf8
                at ./src/compute/kernels/comparison.rs:685:5
     11: arrow::compute::kernels::comparison::tests::test_utf8_array_nulls
                at ./src/compute/kernels/comparison.rs:2305:18
     12: arrow::compute::kernels::comparison::tests::test_utf8_array_nulls::{{closure}}
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold closed issue #1072: String Comparison Kernels Non-Monotonic Offset Panic

Posted by GitBox <gi...@apache.org>.
tustvold closed issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] tustvold commented on issue #1072: String Comparison Kernels Non-Monotonic Offset Panic

Posted by GitBox <gi...@apache.org>.
tustvold commented on issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072#issuecomment-1002587895


   The consensus on #1071 is that offsets should be monotonically increasing, so closing this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #1072: String Comparison Kernels Non-Monotonic Offset Panic

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #1072:
URL: https://github.com/apache/arrow-rs/issues/1072#issuecomment-998315705


   I don't think we necessarily want to support non monotonic offsets -- but the string comparison kernel shouldn't be checking the values of array slots that are not valid (aka are null) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org