You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/27 01:50:16 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #2947: Add GenericByteArray (#2946)

tustvold opened a new pull request, #2947:
URL: https://github.com/apache/arrow-rs/pull/2947

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #2946
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   See ticket
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   Defines a `GenericByteArray` and migrates some methods across, subsequent PRs can then work on eliminating duplication
   
   # Are there any user-facing changes?
   
   No :tada:
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#issuecomment-1294291055

   Removed api-change label as this is not a breaking change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#discussion_r1007171469


##########
arrow-array/src/array/byte_array.rs:
##########
@@ -0,0 +1,206 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::array::{empty_offsets, print_long_array};
+use crate::iterator::ArrayIter;
+use crate::raw_pointer::RawPtrBox;
+use crate::types::bytes::ByteArrayNativeType;
+use crate::types::ByteArrayType;
+use crate::{Array, ArrayAccessor, OffsetSizeTrait};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_data::ArrayData;
+use arrow_schema::DataType;
+use std::any::Any;
+
+/// Generic struct for variable-size byte arrays
+///
+/// See [`StringArray`] and [`LargeStringArray`] for storing string data

Review Comment:
   ```suggestion
   /// See [`StringArray`] and [`LargeStringArray`] for storing utf8 encoded string data
   ```



##########
arrow-array/src/types.rs:
##########
@@ -464,7 +466,7 @@ impl Date64Type {
     }
 }
 
-mod private {
+mod decimal {

Review Comment:
   ```suggestion
   // Crate private / sealed types for Decimal
   // not intended to be used outside this crate
   mod decimal {
   ```



##########
arrow-array/src/array/string_array.rs:
##########
@@ -15,67 +15,27 @@
 // specific language governing permissions and limitations
 // under the License.
 
-use crate::iterator::GenericStringIter;
-use crate::raw_pointer::RawPtrBox;
+use crate::types::GenericStringType;
 use crate::{
-    empty_offsets, print_long_array, Array, ArrayAccessor, GenericBinaryArray,
-    GenericListArray, OffsetSizeTrait,
+    Array, GenericBinaryArray, GenericByteArray, GenericListArray, OffsetSizeTrait,
 };
-use arrow_buffer::{bit_util, Buffer, MutableBuffer};
+use arrow_buffer::{bit_util, MutableBuffer};
 use arrow_data::ArrayData;
 use arrow_schema::DataType;
-use std::any::Any;
 
 /// Generic struct for \[Large\]StringArray
 ///
 /// See [`StringArray`] and [`LargeStringArray`] for storing
 /// specific string data.
-pub struct GenericStringArray<OffsetSize: OffsetSizeTrait> {
-    data: ArrayData,
-    value_offsets: RawPtrBox<OffsetSize>,
-    value_data: RawPtrBox<u8>,
-}
+pub type GenericStringArray<OffsetSize> = GenericByteArray<GenericStringType<OffsetSize>>;

Review Comment:
   This is a very nice cleanup to remove duplication 🏆 



##########
arrow-array/src/types.rs:
##########
@@ -574,6 +576,83 @@ fn format_decimal_str(value_str: &str, precision: usize, scale: usize) -> String
     }
 }
 
+pub(crate) mod bytes {

Review Comment:
   ```suggestion
   // Crate private / sealed types for Byte arrays
   // not intended to be used outside this crate
   pub(crate) mod bytes {
   ```



##########
arrow-array/src/array/byte_array.rs:
##########
@@ -0,0 +1,206 @@
+// Licensed to the Apache Software Foundation (ASF) under one

Review Comment:
   👍 



##########
arrow-array/src/types.rs:
##########
@@ -574,6 +576,83 @@ fn format_decimal_str(value_str: &str, precision: usize, scale: usize) -> String
     }
 }
 
+pub(crate) mod bytes {
+    use super::*;
+
+    pub trait ByteArrayTypeSealed {}
+    impl<O: OffsetSizeTrait> ByteArrayTypeSealed for GenericStringType<O> {}
+    impl<O: OffsetSizeTrait> ByteArrayTypeSealed for GenericBinaryType<O> {}
+
+    pub trait ByteArrayNativeType: std::fmt::Debug + Send + Sync {
+        /// # Safety
+        ///
+        /// `b` must be a valid byte sequence for `Self`
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self;
+    }
+
+    impl ByteArrayNativeType for [u8] {
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+            b
+        }
+    }
+
+    impl ByteArrayNativeType for str {
+        unsafe fn from_bytes_unchecked(b: &[u8]) -> &Self {
+            std::str::from_utf8_unchecked(b)
+        }
+    }
+}
+
+/// A trait over the variable-size byte array types
+///
+/// See [Variable Size Binary Layout](https://arrow.apache.org/docs/format/Columnar.html#variable-size-binary-layout)
+pub trait ByteArrayType: 'static + Send + Sync + bytes::ByteArrayTypeSealed {
+    type Offset: OffsetSizeTrait;
+    type Native: bytes::ByteArrayNativeType + AsRef<[u8]> + ?Sized;
+    const PREFIX: &'static str;

Review Comment:
   ```suggestion
       /// "Binary" or "String", for use in error messages
       const PREFIX: &'static str;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#issuecomment-1294214868

   I intend to merge this this evening, i.e. in about 6 hours


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#discussion_r1007404779


##########
arrow-array/src/array/byte_array.rs:
##########
@@ -0,0 +1,206 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::array::{empty_offsets, print_long_array};
+use crate::iterator::ArrayIter;
+use crate::raw_pointer::RawPtrBox;
+use crate::types::bytes::ByteArrayNativeType;
+use crate::types::ByteArrayType;
+use crate::{Array, ArrayAccessor, OffsetSizeTrait};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_data::ArrayData;
+use arrow_schema::DataType;
+use std::any::Any;
+
+/// Generic struct for variable-size byte arrays
+///
+/// See [`StringArray`] and [`LargeStringArray`] for storing string data
+///
+/// See [`BinaryArray`] and [`LargeBinaryArray`] for storing arbitrary bytes
+///
+/// [`StringArray`]: crate::StringArray
+/// [`LargeStringArray`]: crate::LargeStringArray
+/// [`BinaryArray`]: crate::BinaryArray
+/// [`LargeBinaryArray`]: crate::LargeBinaryArray
+pub struct GenericByteArray<T: ByteArrayType> {
+    data: ArrayData,
+    value_offsets: RawPtrBox<T::Offset>,
+    value_data: RawPtrBox<u8>,
+}
+
+impl<T: ByteArrayType> GenericByteArray<T> {
+    /// Data type of the array.
+    pub const DATA_TYPE: DataType = T::DATA_TYPE;
+
+    /// Returns the length for value at index `i`.
+    #[inline]
+    pub fn value_length(&self, i: usize) -> T::Offset {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
+    }

Review Comment:
   Slice indexing is checked by default, so it will panic in such a case. We could add an unchecked variant, but at that point the caller might as well just call `value_offsets` and do this manually



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#discussion_r1007398376


##########
arrow-array/src/array/byte_array.rs:
##########
@@ -0,0 +1,206 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::array::{empty_offsets, print_long_array};
+use crate::iterator::ArrayIter;
+use crate::raw_pointer::RawPtrBox;
+use crate::types::bytes::ByteArrayNativeType;
+use crate::types::ByteArrayType;
+use crate::{Array, ArrayAccessor, OffsetSizeTrait};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_data::ArrayData;
+use arrow_schema::DataType;
+use std::any::Any;
+
+/// Generic struct for variable-size byte arrays
+///
+/// See [`StringArray`] and [`LargeStringArray`] for storing string data
+///
+/// See [`BinaryArray`] and [`LargeBinaryArray`] for storing arbitrary bytes
+///
+/// [`StringArray`]: crate::StringArray
+/// [`LargeStringArray`]: crate::LargeStringArray
+/// [`BinaryArray`]: crate::BinaryArray
+/// [`LargeBinaryArray`]: crate::LargeBinaryArray
+pub struct GenericByteArray<T: ByteArrayType> {
+    data: ArrayData,
+    value_offsets: RawPtrBox<T::Offset>,
+    value_data: RawPtrBox<u8>,
+}
+
+impl<T: ByteArrayType> GenericByteArray<T> {
+    /// Data type of the array.
+    pub const DATA_TYPE: DataType = T::DATA_TYPE;
+
+    /// Returns the length for value at index `i`.
+    #[inline]
+    pub fn value_length(&self, i: usize) -> T::Offset {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
+    }

Review Comment:
   So this should be `unsafe`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
tustvold merged PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#discussion_r1007407818


##########
arrow-array/src/array/byte_array.rs:
##########
@@ -0,0 +1,208 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::array::{empty_offsets, print_long_array};
+use crate::iterator::ArrayIter;
+use crate::raw_pointer::RawPtrBox;
+use crate::types::bytes::ByteArrayNativeType;
+use crate::types::ByteArrayType;
+use crate::{Array, ArrayAccessor, OffsetSizeTrait};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_data::ArrayData;
+use arrow_schema::DataType;
+use std::any::Any;
+
+/// Generic struct for variable-size byte arrays
+///
+/// See [`StringArray`] and [`LargeStringArray`] for storing utf8 encoded string data
+///
+/// See [`BinaryArray`] and [`LargeBinaryArray`] for storing arbitrary bytes
+///
+/// [`StringArray`]: crate::StringArray
+/// [`LargeStringArray`]: crate::LargeStringArray
+/// [`BinaryArray`]: crate::BinaryArray
+/// [`LargeBinaryArray`]: crate::LargeBinaryArray
+pub struct GenericByteArray<T: ByteArrayType> {
+    data: ArrayData,
+    value_offsets: RawPtrBox<T::Offset>,
+    value_data: RawPtrBox<u8>,
+}
+
+impl<T: ByteArrayType> GenericByteArray<T> {
+    /// Data type of the array.
+    pub const DATA_TYPE: DataType = T::DATA_TYPE;
+
+    /// Returns the length for value at index `i`.
+    /// # Panics
+    /// Panics if index `i` is out of bounds.
+    #[inline]
+    pub fn value_length(&self, i: usize) -> T::Offset {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
+    }
+
+    /// Returns a clone of the value data buffer
+    pub fn value_data(&self) -> Buffer {
+        self.data.buffers()[1].clone()
+    }
+
+    /// Returns the offset values in the offsets buffer
+    #[inline]
+    pub fn value_offsets(&self) -> &[T::Offset] {
+        // Soundness
+        //     pointer alignment & location is ensured by RawPtrBox
+        //     buffer bounds/offset is ensured by the ArrayData instance.
+        unsafe {
+            std::slice::from_raw_parts(
+                self.value_offsets.as_ptr().add(self.data.offset()),
+                self.len() + 1,
+            )
+        }
+    }
+
+    /// Returns the element at index `i` as bytes slice

Review Comment:
   ```suggestion
       /// Returns the element at index `i` as bytes slice or str
   ```



##########
arrow-array/src/array/byte_array.rs:
##########
@@ -0,0 +1,208 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::array::{empty_offsets, print_long_array};
+use crate::iterator::ArrayIter;
+use crate::raw_pointer::RawPtrBox;
+use crate::types::bytes::ByteArrayNativeType;
+use crate::types::ByteArrayType;
+use crate::{Array, ArrayAccessor, OffsetSizeTrait};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_data::ArrayData;
+use arrow_schema::DataType;
+use std::any::Any;
+
+/// Generic struct for variable-size byte arrays
+///
+/// See [`StringArray`] and [`LargeStringArray`] for storing utf8 encoded string data
+///
+/// See [`BinaryArray`] and [`LargeBinaryArray`] for storing arbitrary bytes
+///
+/// [`StringArray`]: crate::StringArray
+/// [`LargeStringArray`]: crate::LargeStringArray
+/// [`BinaryArray`]: crate::BinaryArray
+/// [`LargeBinaryArray`]: crate::LargeBinaryArray
+pub struct GenericByteArray<T: ByteArrayType> {
+    data: ArrayData,
+    value_offsets: RawPtrBox<T::Offset>,
+    value_data: RawPtrBox<u8>,
+}
+
+impl<T: ByteArrayType> GenericByteArray<T> {
+    /// Data type of the array.
+    pub const DATA_TYPE: DataType = T::DATA_TYPE;
+
+    /// Returns the length for value at index `i`.
+    /// # Panics
+    /// Panics if index `i` is out of bounds.
+    #[inline]
+    pub fn value_length(&self, i: usize) -> T::Offset {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
+    }
+
+    /// Returns a clone of the value data buffer
+    pub fn value_data(&self) -> Buffer {
+        self.data.buffers()[1].clone()
+    }
+
+    /// Returns the offset values in the offsets buffer
+    #[inline]
+    pub fn value_offsets(&self) -> &[T::Offset] {
+        // Soundness
+        //     pointer alignment & location is ensured by RawPtrBox
+        //     buffer bounds/offset is ensured by the ArrayData instance.
+        unsafe {
+            std::slice::from_raw_parts(
+                self.value_offsets.as_ptr().add(self.data.offset()),
+                self.len() + 1,
+            )
+        }
+    }
+
+    /// Returns the element at index `i` as bytes slice
+    /// # Safety
+    /// Caller is responsible for ensuring that the index is within the bounds of the array
+    pub unsafe fn value_unchecked(&self, i: usize) -> &T::Native {
+        let end = *self.value_offsets().get_unchecked(i + 1);
+        let start = *self.value_offsets().get_unchecked(i);
+
+        // Soundness
+        // pointer alignment & location is ensured by RawPtrBox
+        // buffer bounds/offset is ensured by the value_offset invariants
+
+        // Safety of `to_isize().unwrap()`
+        // `start` and `end` are &OffsetSize, which is a generic type that implements the
+        // OffsetSizeTrait. Currently, only i32 and i64 implement OffsetSizeTrait,
+        // both of which should cleanly cast to isize on an architecture that supports
+        // 32/64-bit offsets
+        let b = std::slice::from_raw_parts(
+            self.value_data.as_ptr().offset(start.to_isize().unwrap()),
+            (end - start).to_usize().unwrap(),
+        );
+
+        // SAFETY:
+        // ArrayData is valid
+        T::Native::from_bytes_unchecked(b)
+    }
+
+    /// Returns the element at index `i` as bytes slice

Review Comment:
   ```suggestion
       /// Returns the element at index `i` as bytes slice or str
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#issuecomment-1294393121

   Benchmark runs are scheduled for baseline = 73416f8e67efe1d0d8a8529c96c099429ab1b366 and contender = b6f08a87e02144277bb0a7aa3708e42f6faf7a26. b6f08a87e02144277bb0a7aa3708e42f6faf7a26 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/defadb7dae2b470389eef4af939aecf7...e912c957a3db46c1abb39ea55f6e4582/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/3c43a104e8c642cbae01e241de17d832...2262cc80dbcc4f6097add608ce440421/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/763ab2cb30de48c98ee35134f3b9e924...15d0fe3f53814f54bef0ef85ea7f3ce0/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ccb27cffd1c247ea8ef7d3e18563b1eb...3dc3835833a146bbb8f58808a4937402/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #2947: Add GenericByteArray (#2946)

Posted by GitBox <gi...@apache.org>.
viirya commented on code in PR #2947:
URL: https://github.com/apache/arrow-rs/pull/2947#discussion_r1007398023


##########
arrow-array/src/array/byte_array.rs:
##########
@@ -0,0 +1,206 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::array::{empty_offsets, print_long_array};
+use crate::iterator::ArrayIter;
+use crate::raw_pointer::RawPtrBox;
+use crate::types::bytes::ByteArrayNativeType;
+use crate::types::ByteArrayType;
+use crate::{Array, ArrayAccessor, OffsetSizeTrait};
+use arrow_buffer::{ArrowNativeType, Buffer};
+use arrow_data::ArrayData;
+use arrow_schema::DataType;
+use std::any::Any;
+
+/// Generic struct for variable-size byte arrays
+///
+/// See [`StringArray`] and [`LargeStringArray`] for storing string data
+///
+/// See [`BinaryArray`] and [`LargeBinaryArray`] for storing arbitrary bytes
+///
+/// [`StringArray`]: crate::StringArray
+/// [`LargeStringArray`]: crate::LargeStringArray
+/// [`BinaryArray`]: crate::BinaryArray
+/// [`LargeBinaryArray`]: crate::LargeBinaryArray
+pub struct GenericByteArray<T: ByteArrayType> {
+    data: ArrayData,
+    value_offsets: RawPtrBox<T::Offset>,
+    value_data: RawPtrBox<u8>,
+}
+
+impl<T: ByteArrayType> GenericByteArray<T> {
+    /// Data type of the array.
+    pub const DATA_TYPE: DataType = T::DATA_TYPE;
+
+    /// Returns the length for value at index `i`.
+    #[inline]
+    pub fn value_length(&self, i: usize) -> T::Offset {
+        let offsets = self.value_offsets();
+        offsets[i + 1] - offsets[i]
+    }

Review Comment:
   `i` may be out of range?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org