You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "askoa (via GitHub)" <gi...@apache.org> on 2023/02/01 07:49:35 UTC

[GitHub] [arrow-rs] askoa commented on a diff in pull request #3603: Add ArrayAccessor, Iterator, Extend and benchmarks for RunArray

askoa commented on code in PR #3603:
URL: https://github.com/apache/arrow-rs/pull/3603#discussion_r1092863125


##########
arrow-array/src/array/run_array.rs:
##########
@@ -274,15 +296,191 @@ pub type Int32RunArray = RunArray<Int32Type>;
 /// ```
 pub type Int64RunArray = RunArray<Int64Type>;
 
+/// A strongly-typed wrapper around a [`RunArray`] that implements [`ArrayAccessor`]
+/// and [`IntoIterator`] allowing fast access to its elements
+///
+/// ```
+/// use arrow_array::{RunArray, StringArray, types::Int32Type};
+///
+/// let orig = ["a", "b", "a", "b"];
+/// let ree_array = RunArray::<Int32Type>::from_iter(orig);
+///
+/// // `TypedRunArray` allows you to access the values directly
+/// let typed = ree_array.downcast_ref::<StringArray>().unwrap();
+///
+/// for (maybe_val, orig) in typed.into_iter().zip(orig) {
+///     assert_eq!(maybe_val.unwrap(), orig)
+/// }
+/// ```
+pub struct TypedRunArray<'a, R: RunEndIndexType, V> {
+    /// The run array
+    run_array: &'a RunArray<R>,
+
+    /// The values of the run_array
+    values: &'a V,
+}
+
+// Manually implement `Clone` to avoid `V: Clone` type constraint
+impl<'a, R: RunEndIndexType, V> Clone for TypedRunArray<'a, R, V> {
+    fn clone(&self) -> Self {
+        Self {
+            run_array: self.run_array,
+            values: self.values,
+        }
+    }
+}
+
+impl<'a, R: RunEndIndexType, V> Copy for TypedRunArray<'a, R, V> {}
+
+impl<'a, R: RunEndIndexType, V> std::fmt::Debug for TypedRunArray<'a, R, V> {
+    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+        writeln!(f, "TypedRunArray({:?})", self.run_array)
+    }
+}
+
+impl<'a, R: RunEndIndexType, V> TypedRunArray<'a, R, V> {
+    /// Returns the run_ends of this [`TypedRunArray`]
+    pub fn run_ends(&self) -> &'a PrimitiveArray<R> {
+        self.run_array.run_ends()
+    }
+
+    /// Returns the values of this [`TypedRunArray`]
+    pub fn values(&self) -> &'a V {
+        self.values
+    }
+
+    /// Returns index to the physcial array for the given index to the logical array.
+    /// Performs a binary search on the run_ends array for the input index.
+    #[inline]
+    pub fn get_physical_index(&self, logical_index: usize) -> Option<usize> {
+        if logical_index >= self.run_array.len() {
+            return None;
+        }
+        let mut st: usize = 0;

Review Comment:
   To use `binary_search_by` a custom `Ordering` has to be implemented. Two values (`i` and `i-1`) has to be accessed to determine `std::cmp::Ordering::Equal`.  I am not sure what'll happen if `Equal` was never returned in custom `Ordering`. I did not spend lot of time on it because did not see the benefits.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org