You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/03/23 12:02:36 UTC

[GitHub] [arrow-rs] tustvold opened a new pull request, #3912: Add AsArray trait for more ergonomic downcasting

tustvold opened a new pull request, #3912:
URL: https://github.com/apache/arrow-rs/pull/3912

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #.
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   We currently provide downcasting utility functions, however, they have a couple of issues:
   
   * Lack of method chaining, makes for more nested and potentially less legible code
   * Methods are named inconsistently and are unnecessarily verbose - e.g `as_largestring_array` but `as_large_list_array`
   * Not all methods can easily be used in generic context, e.g. not `as_byte_array`
   * When called on `&ArrayRef` result in two dynamic dispatches, as `ArrayRef` is itself coerced to `dyn Array`
   * Lack of autoderef means you often need to add `&` as necessary
   * Each method has to be imported separately
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146128716


##########
arrow-array/src/cast.rs:
##########
@@ -709,6 +709,165 @@ where
     T::from(array.to_data())
 }
 
+mod private {
+    pub trait Sealed {}
+}
+
+/// An extension trait for `dyn Array` that provides ergonomic downcasting
+///
+/// ```
+/// # use std::sync::Arc;
+/// # use arrow_array::{ArrayRef, Int32Array};
+/// # use arrow_array::cast::AsArray;
+/// # use arrow_array::types::Int32Type;
+/// let col = Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef;
+/// assert_eq!(col.as_primitive::<Int32Type>().values(), &[1, 2, 3]);
+/// ```
+pub trait AsArray: private::Sealed {
+    /// Downcast this to a [`BooleanArray`] returning `None` if not possible
+    fn as_boolean_opt(&self) -> Option<&BooleanArray>;
+
+    /// Downcast this to a [`BooleanArray`] panicking if not possible
+    fn as_boolean(&self) -> &BooleanArray {
+        self.as_boolean_opt().expect("boolean array")
+    }
+
+    /// Downcast this to a [`PrimitiveArray`] returning `None` if not possible
+    fn as_primitive_opt<T: ArrowPrimitiveType>(&self) -> Option<&PrimitiveArray<T>>;
+
+    /// Downcast this to a [`PrimitiveArray`] panicking if not possible
+    fn as_primitive<T: ArrowPrimitiveType>(&self) -> &PrimitiveArray<T> {
+        self.as_primitive_opt().expect("primitive array")
+    }
+
+    /// Downcast this to a [`GenericByteArray`] returning `None` if not possible
+    fn as_bytes_opt<T: ByteArrayType>(&self) -> Option<&GenericByteArray<T>>;

Review Comment:
   In hindsight I probably should have called this `GenericBytesArray` but hey ho



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#issuecomment-1481773173

   > Mark as_XXX_array functions deprecated (or maybe make a note to do so in a later release)
   
   I will make a note to do so in a future PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] appletreeisyellow commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "appletreeisyellow (via GitHub)" <gi...@apache.org>.
appletreeisyellow commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146660972


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   Awesome! 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146783510


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   There is also `as_bytes::<Utf8Type>()` / `as_bytes::<BinaryType>`, etc... if that is more obvious for users? The `as_binary` and `as_string` are just short-cuts for the common case of only caring about one
   
   Edit: Ultimately I'm not sure there is a way to abstract away caring about the offsets, if you're dealing with variable length types, be they strings, binary or lists, you need to type the offset...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146547284


##########
arrow-ord/src/comparison.rs:
##########
@@ -893,14 +875,8 @@ pub fn lt_eq_dyn_binary_scalar(
     right: &[u8],
 ) -> Result<BooleanArray, ArrowError> {
     match left.data_type() {
-        DataType::Binary => {
-            let left = as_generic_binary_array::<i32>(left);
-            lt_eq_binary_scalar(left, right)
-        }
-        DataType::LargeBinary => {
-            let left = as_generic_binary_array::<i64>(left);
-            lt_eq_binary_scalar(left, right)
-        }
+        DataType::Binary => lt_eq_binary_scalar(left.as_binary::<i32>(), right),

Review Comment:
   it sure looks nicer



##########
arrow-array/src/cast.rs:
##########
@@ -709,6 +709,165 @@ where
     T::from(array.to_data())
 }
 
+mod private {
+    pub trait Sealed {}
+}
+
+/// An extension trait for `dyn Array` that provides ergonomic downcasting
+///
+/// ```
+/// # use std::sync::Arc;
+/// # use arrow_array::{ArrayRef, Int32Array};
+/// # use arrow_array::cast::AsArray;
+/// # use arrow_array::types::Int32Type;
+/// let col = Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef;
+/// assert_eq!(col.as_primitive::<Int32Type>().values(), &[1, 2, 3]);
+/// ```
+pub trait AsArray: private::Sealed {
+    /// Downcast this to a [`BooleanArray`] returning `None` if not possible
+    fn as_boolean_opt(&self) -> Option<&BooleanArray>;
+
+    /// Downcast this to a [`BooleanArray`] panicking if not possible
+    fn as_boolean(&self) -> &BooleanArray {
+        self.as_boolean_opt().expect("boolean array")
+    }
+
+    /// Downcast this to a [`PrimitiveArray`] returning `None` if not possible
+    fn as_primitive_opt<T: ArrowPrimitiveType>(&self) -> Option<&PrimitiveArray<T>>;
+
+    /// Downcast this to a [`PrimitiveArray`] panicking if not possible
+    fn as_primitive<T: ArrowPrimitiveType>(&self) -> &PrimitiveArray<T> {
+        self.as_primitive_opt().expect("primitive array")
+    }
+
+    /// Downcast this to a [`GenericByteArray`] returning `None` if not possible
+    fn as_bytes_opt<T: ByteArrayType>(&self) -> Option<&GenericByteArray<T>>;
+
+    /// Downcast this to a [`GenericByteArray`] panicking if not possible
+    fn as_bytes<T: ByteArrayType>(&self) -> &GenericByteArray<T> {
+        self.as_bytes_opt().expect("byte array")
+    }
+
+    /// Downcast this to a [`GenericStringArray`] returning `None` if not possible
+    fn as_string_opt<O: OffsetSizeTrait>(&self) -> Option<&GenericStringArray<O>> {
+        self.as_bytes_opt()
+    }
+
+    /// Downcast this to a [`GenericStringArray`] panicking if not possible
+    fn as_string<O: OffsetSizeTrait>(&self) -> &GenericStringArray<O> {
+        self.as_bytes_opt().expect("string array")
+    }
+
+    /// Downcast this to a [`GenericBinaryArray`] returning `None` if not possible
+    fn as_binary_opt<O: OffsetSizeTrait>(&self) -> Option<&GenericBinaryArray<O>> {
+        self.as_bytes_opt()
+    }
+
+    /// Downcast this to a [`GenericBinaryArray`] panicking if not possible
+    fn as_binary<O: OffsetSizeTrait>(&self) -> &GenericBinaryArray<O> {
+        self.as_bytes_opt().expect("binary array")
+    }
+
+    /// Downcast this to a [`StructArray`] returning `None` if not possible
+    fn as_struct_opt(&self) -> Option<&StructArray>;
+
+    /// Downcast this to a [`StructArray`] panicking if not possible
+    fn as_struct(&self) -> &StructArray {
+        self.as_struct_opt().expect("struct array")
+    }
+
+    /// Downcast this to a [`GenericListArray`] returning `None` if not possible
+    fn as_list_opt<O: OffsetSizeTrait>(&self) -> Option<&GenericListArray<O>>;
+
+    /// Downcast this to a [`GenericListArray`] panicking if not possible
+    fn as_list<O: OffsetSizeTrait>(&self) -> &GenericListArray<O> {
+        self.as_list_opt().expect("list array")
+    }
+
+    /// Downcast this to a [`MapArray`] returning `None` if not possible
+    fn as_map_opt(&self) -> Option<&MapArray>;
+
+    /// Downcast this to a [`MapArray`] panicking if not possible
+    fn as_map(&self) -> &MapArray {
+        self.as_map_opt().expect("map array")
+    }
+
+    /// Downcast this to a [`DictionaryArray`] returning `None` if not possible
+    fn as_dictionary_opt<K: ArrowDictionaryKeyType>(&self)
+        -> Option<&DictionaryArray<K>>;
+
+    /// Downcast this to a [`DictionaryArray`] panicking if not possible
+    fn as_dictionary<K: ArrowDictionaryKeyType>(&self) -> &DictionaryArray<K> {
+        self.as_dictionary_opt().expect("dictionary array")
+    }
+}
+
+impl private::Sealed for dyn Array + '_ {}
+impl AsArray for dyn Array + '_ {
+    fn as_boolean_opt(&self) -> Option<&BooleanArray> {
+        self.as_any().downcast_ref()

Review Comment:
   this is very clever 👍 
   
   



##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   this is much nicer -- I hit this with @appletreeisyellow when trying to downcast to a `BinaryArray` (having to know to put i32 was non obvious)



##########
arrow-array/src/cast.rs:
##########
@@ -709,6 +709,165 @@ where
     T::from(array.to_data())
 }
 
+mod private {
+    pub trait Sealed {}
+}
+
+/// An extension trait for `dyn Array` that provides ergonomic downcasting
+///
+/// ```
+/// # use std::sync::Arc;
+/// # use arrow_array::{ArrayRef, Int32Array};
+/// # use arrow_array::cast::AsArray;
+/// # use arrow_array::types::Int32Type;
+/// let col = Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef;
+/// assert_eq!(col.as_primitive::<Int32Type>().values(), &[1, 2, 3]);
+/// ```
+pub trait AsArray: private::Sealed {
+    /// Downcast this to a [`BooleanArray`] returning `None` if not possible
+    fn as_boolean_opt(&self) -> Option<&BooleanArray>;

Review Comment:
   I like how you have provided both `_opt` and non `_opt` variants so users can choose 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146783510


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   There is also `as_bytes::<Utf8Type>()` / `as_bytes::<BinaryType>`, etc... if that is more obvious for users? The `as_binary` and `as_string` are just short-cuts for the common case of only caring about a strings



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146783510


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   There is also `as_bytes::<Utf8Type>()` if that is more obvious for users?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146791259


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   I don't necessarily think the code needs to change. I think what helps the most is having some examples around that you can start from. So as always more docs / examples I think the better



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146706516


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   Err... The i32 is just being inferred because of the `: &BinaryArray` type constraint...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146783510


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   There is also `as_bytes::<Utf8Type>()` / `as_bytes::<BinaryType>`, etc... if that is more obvious for users? The `as_binary` and `as_string` are just short-cuts for the common case of only caring about one



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #3912: Add AsArray trait for more ergonomic downcasting

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #3912:
URL: https://github.com/apache/arrow-rs/pull/3912#discussion_r1146780311


##########
arrow-array/src/builder/generic_byte_run_builder.rs:
##########
@@ -368,7 +368,7 @@ pub type LargeStringRunBuilder<K> = GenericByteRunBuilder<K, LargeUtf8Type>;
 ///
 /// // Values are polymorphic and so require a downcast.
 /// let av = array.values();
-/// let ava: &BinaryArray = as_generic_binary_array::<i32>(av.as_ref());
+/// let ava: &BinaryArray = av.as_binary();

Review Comment:
   I am thinking as a user here 🤔 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org