You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/04/11 16:45:01 UTC

[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5964: Specialized Cursor for StringArray and BinaryArray

tustvold commented on code in PR #5964:
URL: https://github.com/apache/arrow-datafusion/pull/5964#discussion_r1163084143


##########
datafusion/core/src/physical_plan/sorts/cursor.rs:
##########
@@ -97,31 +99,104 @@ impl Cursor for RowCursor {
     }
 }
 
-/// A cursor over sorted, nullable [`ArrowNativeTypeOp`]
+/// An [`Array`] that can be converted into [`FieldValues`]
+pub trait FieldArray: Array + 'static {
+    type Values: FieldValues;
+
+    fn values(&self) -> Self::Values;
+}
+
+/// A comparable set of non-nullable values
+pub trait FieldValues {
+    type Value: ?Sized;
+
+    fn len(&self) -> usize;
+
+    fn compare(a: &Self::Value, b: &Self::Value) -> Ordering;
+
+    fn value(&self, idx: usize) -> &Self::Value;
+}
+
+impl<T: ArrowPrimitiveType> FieldArray for PrimitiveArray<T> {
+    type Values = PrimitiveValues<T::Native>;
+
+    fn values(&self) -> Self::Values {
+        PrimitiveValues(self.values().clone())
+    }
+}
+
+#[derive(Debug)]
+pub struct PrimitiveValues<T: ArrowNativeTypeOp>(ScalarBuffer<T>);
+
+impl<T: ArrowNativeTypeOp> FieldValues for PrimitiveValues<T> {
+    type Value = T;
+
+    fn len(&self) -> usize {
+        self.0.len()
+    }
+
+    #[inline]
+    fn compare(a: &Self::Value, b: &Self::Value) -> Ordering {
+        T::compare(*a, *b)
+    }
+
+    #[inline]
+    fn value(&self, idx: usize) -> &Self::Value {
+        &self.0[idx]
+    }
+}
+
+impl<T: ByteArrayType> FieldArray for GenericByteArray<T> {
+    type Values = Self;
+
+    fn values(&self) -> Self::Values {
+        self.clone()

Review Comment:
   I originally planned to make it so that this deconstructed to the underlying buffers, in order to avoid redundant codegen for strings and binary arrays. Unfortunately this needed https://github.com/apache/arrow-rs/pull/4048
   
   In practice the additional codegen is not likely to matter, we can always revisit if it becomes a problem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org