You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/20 12:04:51 UTC

[GitHub] [arrow-rs] crepererum opened a new pull request, #2902: add `Array::as_any_arc`

crepererum opened a new pull request, #2902:
URL: https://github.com/apache/arrow-rs/pull/2902

   # Which issue does this PR close?
   
   Closes #2901.
   
   # Rationale for this change
    
   Allow users to keep downcasted arrays arround. This is useful when you wanna return an array from a method for which you know the type and the caller wants to used the typed array (e.g. for certain kernels).
   
   # What changes are included in this PR?
   
   New method `Array::as_any_arc`.
   
   # Are there any user-facing changes?
   New method `Array::as_any_arc`. Since this method is required by all implementations, this is a **breaking change**.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1001197106


##########
arrow-array/src/array/mod.rs:
##########
@@ -258,6 +283,10 @@ impl Array for ArrayRef {
         self.as_ref().as_any()
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   I was really confused how this was working, and then realised this is actually implementing `as_any_arc` for `Arc<Arc<dyn Array>>` :sweat_smile: 
   
   Perhaps this should return `None` to be consistent with `as_any` above which calls through to the inner array?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1003636271


##########
arrow-array/src/array/mod.rs:
##########
@@ -258,6 +283,10 @@ impl Array for ArrayRef {
         self.as_ref().as_any()
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   > as_any "above" (i.e. in the same impl) returns None
   
   No it returns `self.as_ref().as_any()`. My point here is much like the other cases, we only have a reference to the underlying `dyn Array` here through an `Arc`. As written this method will return a `Arc<dyn Any>` that will downcast to an `Arc<ArrayRef>`, as opposed to `as_any()` which will downcast to `PrimitiveArray`, etc... I think this is a bit off perhaps? Given we have the ability to return `None` for cases where "this doesn't make sense" perhaps we could use it here?
   
   Alternatively we could clone the inner `ArrayRef` and downcast that :thinking: 
   
   
   > TBH I don't really like this solution but couldn't come up with something more elegant
   
   Ditto, hence my suggestion to hide it away in a private trait so we can revisit it later perhaps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
crepererum commented on PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#issuecomment-1315042634

   #2902 got closed using a different implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1003629274


##########
arrow-array/src/array/mod.rs:
##########
@@ -88,6 +88,31 @@ pub trait Array: std::fmt::Debug + Send + Sync {
     /// ```
     fn as_any(&self) -> &dyn Any;
 
+    /// Returns the array as [`Any`](std::any::Any) wrapped into an [`Arc`] so that it can be
+    /// downcasted to a specific implementation without the need for an unsized reference. This is helpful if you want
+    /// to return a concrete array type.
+    /// 
+    /// Note that this conversion may not be possible for all types due to the `'static` constraint.
+    ///
+    /// # Example:
+    ///
+    /// ```
+    /// # use std::sync::Arc;
+    /// # use arrow_array::{Int32Array, RecordBatch};
+    /// # use arrow_schema::{Schema, Field, DataType, ArrowError};
+    ///
+    /// let id = Int32Array::from(vec![1, 2, 3, 4, 5]);
+    /// let batch = RecordBatch::try_new(
+    ///     Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)])),
+    ///     vec![Arc::new(id)]
+    /// ).unwrap();
+    ///
+    /// let int32array = Arc::downcast::<Int32Array>(
+    ///     Arc::clone(batch.column(0)).as_any_arc().expect("Failed to create static Arc")
+    /// ).expect("Failed to downcast");

Review Comment:
   Yes, the nature of the way that the downcasting works means that Array is effectively sealed as there would be no use to implementing it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1001204909


##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -520,6 +521,10 @@ impl<T: ArrowPrimitiveType> Array for DictionaryArray<T> {
         self
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   I think this would still work with Option<Arc<dyn Any>>



##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -520,6 +521,10 @@ impl<T: ArrowPrimitiveType> Array for DictionaryArray<T> {
         self
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   I think this would still work with `Option<Arc<dyn Any>>`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum closed pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
crepererum closed pull request #2902: add `Array::as_any_arc`
URL: https://github.com/apache/arrow-rs/pull/2902


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
crepererum commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1003459174


##########
arrow-array/src/array/mod.rs:
##########
@@ -258,6 +283,10 @@ impl Array for ArrayRef {
         self.as_ref().as_any()
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   `as_any` "above" (i.e. in the same `impl`) returns `None`. We only return `None` for two cases where we have a references that are not `'static` (TBH I don't really like this solution but couldn't come up with something more elegant).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
crepererum commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1003457471


##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -520,6 +521,10 @@ impl<T: ArrowPrimitiveType> Array for DictionaryArray<T> {
         self
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   I don't get your question. Is that that you're wondering about `Send + Sync + 'static`? They are required for `Arc::downcast`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1003636271


##########
arrow-array/src/array/mod.rs:
##########
@@ -258,6 +283,10 @@ impl Array for ArrayRef {
         self.as_ref().as_any()
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   > as_any "above" (i.e. in the same impl) returns None
   
   No it returns `self.as_ref().as_any()`. My point here is much like the other cases, we only have a reference to the underlying `dyn Array` here through an `Arc`. As written this method will return a `Arc<dyn Any>` that will downcast to an `Arc<ArrayRef>`, as opposed to `as_any()` which will downcast to `PrimitiveArray`, etc... I think this is a bit off perhaps? Given we have the ability to return `None` for cases where "this doesn't make sense" perhaps we could use it here?
   
   
   > TBH I don't really like this solution but couldn't come up with something more elegant
   
   Ditto, hence my suggestion to hide it away in a private trait so we can revisit it later perhaps



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#issuecomment-1295927428

   Marking as draft for now as I'm not sure what the consensus on the path forward here is. I think tcloning the ArrayData is a workable solution currently, and we can continue to provide statically typed versions of kernels to reduce the need for this (#2967)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1001199307


##########
arrow-array/src/array/mod.rs:
##########
@@ -88,6 +88,31 @@ pub trait Array: std::fmt::Debug + Send + Sync {
     /// ```
     fn as_any(&self) -> &dyn Any;
 
+    /// Returns the array as [`Any`](std::any::Any) wrapped into an [`Arc`] so that it can be
+    /// downcasted to a specific implementation without the need for an unsized reference. This is helpful if you want
+    /// to return a concrete array type.
+    /// 
+    /// Note that this conversion may not be possible for all types due to the `'static` constraint.
+    ///
+    /// # Example:
+    ///
+    /// ```
+    /// # use std::sync::Arc;
+    /// # use arrow_array::{Int32Array, RecordBatch};
+    /// # use arrow_schema::{Schema, Field, DataType, ArrowError};
+    ///
+    /// let id = Int32Array::from(vec![1, 2, 3, 4, 5]);
+    /// let batch = RecordBatch::try_new(
+    ///     Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)])),
+    ///     vec![Arc::new(id)]
+    /// ).unwrap();
+    ///
+    /// let int32array = Arc::downcast::<Int32Array>(
+    ///     Arc::clone(batch.column(0)).as_any_arc().expect("Failed to create static Arc")
+    /// ).expect("Failed to downcast");

Review Comment:
   I wonder if we should provide a `downcast_array_ref` method in `cast` that can abstract away this logic (and potentially allow hiding this method by putting it on a super trait within a private module) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1003631551


##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -520,6 +521,10 @@ impl<T: ArrowPrimitiveType> Array for DictionaryArray<T> {
         self
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   Oh the Send + Sync is required, but I think you can drop the `'static` as it is implied by `Any` (and lifetime erasure for trait objects)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1001044293


##########
arrow-array/src/array/dictionary_array.rs:
##########
@@ -597,6 +602,10 @@ impl<'a, K: ArrowPrimitiveType, V: Sync> Array for TypedDictionaryArray<'a, K, V
         self.dictionary
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {
+        None

Review Comment:
   Hmm... Perhaps we should remove the requirement for ArrayAccessor to impl Array :thinking: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
tustvold commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1001197106


##########
arrow-array/src/array/mod.rs:
##########
@@ -258,6 +283,10 @@ impl Array for ArrayRef {
         self.as_ref().as_any()
     }
 
+    fn as_any_arc(self: Arc<Self>) -> Option<Arc<dyn Any + Send + Sync + 'static>> {

Review Comment:
   I was really confused how this was working, and then realised this is actually implementing `as_any_arc` for `Arc<Arc<dyn Array>>` :sweat_smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] crepererum commented on a diff in pull request #2902: add `Array::as_any_arc`

Posted by GitBox <gi...@apache.org>.
crepererum commented on code in PR #2902:
URL: https://github.com/apache/arrow-rs/pull/2902#discussion_r1003454609


##########
arrow-array/src/array/mod.rs:
##########
@@ -88,6 +88,31 @@ pub trait Array: std::fmt::Debug + Send + Sync {
     /// ```
     fn as_any(&self) -> &dyn Any;
 
+    /// Returns the array as [`Any`](std::any::Any) wrapped into an [`Arc`] so that it can be
+    /// downcasted to a specific implementation without the need for an unsized reference. This is helpful if you want
+    /// to return a concrete array type.
+    /// 
+    /// Note that this conversion may not be possible for all types due to the `'static` constraint.
+    ///
+    /// # Example:
+    ///
+    /// ```
+    /// # use std::sync::Arc;
+    /// # use arrow_array::{Int32Array, RecordBatch};
+    /// # use arrow_schema::{Schema, Field, DataType, ArrowError};
+    ///
+    /// let id = Int32Array::from(vec![1, 2, 3, 4, 5]);
+    /// let batch = RecordBatch::try_new(
+    ///     Arc::new(Schema::new(vec![Field::new("id", DataType::Int32, false)])),
+    ///     vec![Arc::new(id)]
+    /// ).unwrap();
+    ///
+    /// let int32array = Arc::downcast::<Int32Array>(
+    ///     Arc::clone(batch.column(0)).as_any_arc().expect("Failed to create static Arc")
+    /// ).expect("Failed to downcast");

Review Comment:
   Adding a helpers sounds great.
   
   Regarding privacy: do we consider `Array` to be a sealed trait (because this would be the implication)? Currently users can implement their own `Array` types if they want.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org