You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/02 06:44:16 UTC

[GitHub] [arrow-rs] HaoYang670 commented on a diff in pull request #1633: Add `substring` support for `FixedSizeBinaryArray`

HaoYang670 commented on code in PR #1633:
URL: https://github.com/apache/arrow-rs/pull/1633#discussion_r862620323


##########
arrow/src/compute/kernels/substring.rs:
##########
@@ -86,6 +86,52 @@ fn binary_substring<OffsetSize: BinaryOffsetSizeTrait>(
     Ok(make_array(data))
 }
 
+fn fixed_size_binary_substring(
+    array: &FixedSizeBinaryArray,
+    old_len: i32,
+    start: i32,
+    length: Option<i32>,
+) -> Result<ArrayRef> {
+    let new_start = if start >= 0 {
+        start.min(old_len)
+    } else {
+        (old_len + start).max(0)
+    };
+    let new_len = match length {
+        Some(len) => len.min(old_len - new_start),
+        None => old_len - new_start,
+    };

Review Comment:
   I think the answer is no for several reasons:
   1. the definition of the public function makes sure that `length` is always >=0
   ```rust
   pub fn substring(array: &dyn Array, start: i64, length: Option<u64>) -> Result<ArrayRef> {
   ```
   2. the definition of the data type of `FixedSizeBinary` seems to allow negative value, although it is somewhat weird:
   ```rust
       /// Opaque binary data of fixed size.
       /// Enum parameter specifies the number of bytes per value.
       FixedSizeBinary(i32),
   ```
   
   Although it may be more reasonable to use unsigned int to type the `length`, in Apache Arrow specification, the `length` must be `i32`. https://arrow.apache.org/docs/format/Columnar.html#fixed-size-list-layout (For nested arrays, we always use signed integer to represent the length or offsets.)
    ```
   A fixed size list type is specified like FixedSizeList<T>[N], where T is any type (primitive or nested) and N is a 32-bit signed integer representing the length of the lists.
   ```
   
   What would happen if we give a negative length or negative offsets buffer? 
   This is a fun game!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org