You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/02/10 18:14:00 UTC

[GitHub] [arrow-rs] tustvold opened a new issue, #3691: Cast Binary to Utf8 With Safe True is Unsound

tustvold opened a new issue, #3691:
URL: https://github.com/apache/arrow-rs/issues/3691

   **Describe the bug**
   <!--
   A clear and concise description of what the bug is.
   -->
   
   `cast_binary_to_string` added in #3624 is unsound as it creates a `StringArray` containing invalid UTF-8 data. The data is null, but this is insufficient to meet the ArrayData contract which requires all the data be valid UTF-8.
   
   **To Reproduce**
   <!--
   Steps to reproduce the behavior:
   -->
   
   ```
   #[test]
       fn test_cast_invalid_utf8() {
           let v1: &[u8] = b"\xFF invalid";
           let v2: &[u8] = b"\x00 Foo";
           let s = BinaryArray::from(vec![v1, v2]);
           let options = CastOptions { safe: true };
           let array = cast_with_options(&s, &DataType::Utf8, &options).unwrap();
           let a = as_string_array(array.as_ref());
           a.data().validate_full().unwrap();
           
           assert_eq!(a.null_count(), 1);
           assert_eq!(a.len(), 2);
           assert!(a.is_null(0));
           assert_eq!(a.value(0), "");
           assert_eq!(a.value(1), "\x00 Foo");
       }
   ```
   
   **Expected behavior**
   <!--
   A clear and concise description of what you expected to happen.
   -->
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3691: Cast Binary to Utf8 With Safe True is Unsound

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #3691: Cast Binary to Utf8 With Safe True is Unsound
URL: https://github.com/apache/arrow-rs/issues/3691


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3691: Cast Binary to Utf8 With Safe True is Unsound

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3691:
URL: https://github.com/apache/arrow-rs/issues/3691#issuecomment-1426202673

   FWIW this was found by testing the pre-release version against datafusion https://github.com/apache/arrow-datafusion/pull/5241


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org