You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "kylebarron (via GitHub)" <gi...@apache.org> on 2023/11/13 01:16:43 UTC

[PR] Add a `copy` method to `FFI_ArrowArray` [arrow-rs]

kylebarron opened a new pull request, #5070:
URL: https://github.com/apache/arrow-rs/pull/5070

   # Which issue does this PR close?
   
   Progress for https://github.com/apache/arrow-rs/issues/5067. I can create a different issue if you need.
   
   Closes #.
   
   # Rationale for this change
   
   There are cases when moving an FFI struct is necessary. In particular, I'm trying to implement #5067 to support the new [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html). In that case, the PyCapsule object owns the ffi pointers, and only exposes a mutable reference to them. 
   
   The C Data Interface has a section about the [requirements of moving an array](https://arrow.apache.org/docs/format/CDataInterface.html#moving-an-array). In particular:
   
   > The consumer can move the ArrowArray structure by bitwise copying or shallow member-wise copying. Then it MUST mark the source structure released (see โ€œreleased structureโ€ above for how to do it) but without calling the release callback. This ensures that only one live copy of the struct is active at any given time and that lifetime is correctly communicated to the producer
   
   I don't think there's a current public API to set `release` to `None` without calling the callback.
   
   An alternative implementation of this would be to change `ffi::from_ffi` to take a mutable reference to `FFI_ArrowArray`, which would copy it under the hood. Adding a `copy` method like in this PR is likely to be less invasive?
   
   Additionally, it seems I can't implement `Copy` directly on `FFI_ArrowArray` because "`Copy` not allowed on types with destructors". 
   
   With this change, I no longer get segfaults when copying a pyarrow array ([I did previously](https://github.com/apache/arrow-rs/issues/5067#issuecomment-1807225673)):
   ![image](https://github.com/apache/arrow-rs/assets/15164633/71664b94-d0c0-4658-be15-b7e1052d1161)
   
   using this code:
   <details>
   
   ```rs
   #[pyfunction]
   pub fn read_array(ob: &'_ PyAny) -> PyResult<()> {
       let arr = pyobj_to_array(ob)?;
       println!("{:?}", arr);
       Ok(())
   }
   
   pub fn pyobj_to_array(ob: &'_ PyAny) -> PyResult<ArrayData> {
       if ob.hasattr("__arrow_c_array__")? {
           let tuple = ob.getattr("__arrow_c_array__")?.call0()?;
   
           if !tuple.is_instance_of::<PyTuple>() {
               return Err(PyTypeError::new_err(
                   "Expected __arrow_c_array__ to return a tuple.",
               ));
           }
   
           let schema_capsule = tuple.get_item(0)?;
           if !schema_capsule.is_instance_of::<PyCapsule>() {
               return Err(PyTypeError::new_err(
                   "Expected __arrow_c_array__ first element to be PyCapsule.",
               ));
           }
           let schema_capsule: &PyCapsule = PyTryInto::try_into(schema_capsule)?;
           let schema_capsule_name = schema_capsule.name()?;
           if schema_capsule_name.is_none() {
               return Err(PyValueError::new_err(
                   "Expected PyCapsule to have name set.",
               ));
           }
           let schema_capsule_name = schema_capsule_name.unwrap().to_str()?;
           if schema_capsule_name != "arrow_schema" {
               return Err(PyValueError::new_err(
                   "Expected name 'arrow_schema' in PyCapsule.",
               ));
           }
   
           let array_capsule = tuple.get_item(1)?;
           if !array_capsule.is_instance_of::<PyCapsule>() {
               return Err(PyTypeError::new_err(
                   "Expected __arrow_c_array__ second element to be PyCapsule.",
               ));
           }
           let array_capsule: &PyCapsule = PyTryInto::try_into(array_capsule)?;
           let array_capsule_name = array_capsule.name()?;
           if array_capsule_name.is_none() {
               return Err(PyValueError::new_err(
                   "Expected PyCapsule to have name set.",
               ));
           }
           let array_capsule_name = array_capsule_name.unwrap().to_str()?;
           if array_capsule_name != "arrow_array" {
               return Err(PyValueError::new_err(
                   "Expected name 'arrow_array' in PyCapsule.",
               ));
           }
           let array_ptr = array_capsule.pointer();
   
           let array_ptr = array_ptr as *mut ffi::FFI_ArrowArray;
           let owned_array_ptr = unsafe { array_ptr.as_mut().unwrap().copy() };
   
           unsafe {
               println!(
                   "is original released: {}",
                   array_ptr.as_mut().unwrap().is_released()
               );
           };
   
           let arr = unsafe {
               ffi::from_ffi(
                   owned_array_ptr,
                   schema_capsule.reference::<ffi::FFI_ArrowSchema>(),
               )
               .unwrap()
           };
           return Ok(arr);
       }
   
       Err(PyValueError::new_err(
           "Expected an object with dunder __arrow_c_array__",
       ))
   }
   ```
   
   <details>
   
   # What changes are included in this PR?
   
   Add a `copy()` method onto `FFI_ArrowArray`
   
   # Are there any user-facing changes?
   
   Add a `copy()` method onto `FFI_ArrowArray`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391399896


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   Hmm, scratch that. PyO3's `new_with_destructor` actual wraps all its arguments using `Box::new`, so lifetime should be ok...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391093419


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   Ah, ok. No problem from me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391272641


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   The motivation for exporting this was to enable creating new PyCapsules from rust with a destructor that releases the array, but it looks like it should be possible to manually call `drop` without exporting the underlying function:
   
   ```rs
   fn pycapsule_array_destructor(ffi_array: FFI_ArrowArray, _capsule_context: *mut c_void) {
       drop(ffi_array)
   }
   
   let ffi_array = FFI_ArrowArray::new(...);
   let array_capsule_name = CString::new("arrow_array").unwrap();
   
   Python::with_gil(|py| {
       let capsule = PyCapsule::new_with_destructor(
           py,
           ffi_array,
           Some(array_capsule_name),
           pycapsule_array_destructor,
       )
       .unwrap();
   });
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391489292


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   FWIW drop is actually a no-op. If you look at the definition it is just an empty function body that consumes the argument.
   
   You should be able to just use `PyCapsule::new` and it will work correctly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391026753


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   I feel fairly strongly we should not expose this method publicly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391390448


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   Not a Rust programmer, but it seems you should do something like this:
   ```rust
   
   fn destroy_schema_capsule(val: *mut FFI_ArrowSchema, context: *mut c_void) {
       let boxed_schema = unsafe { *Box::from_raw(val) };
       // Will release boxed schema and heap-deallocate it
   }
   
   // Import the field into a heap-allocated FFI_ArrowSchema
   let ffi_schema = FFI_ArrowSchema::try_from(&*field).unwrap();
   let boxed_ffi_schema = Box::<FFI_ArrowSchema>::new();
   std::mem::replace(&*boxed_ffi_schema, ffi_schema);
   
   let schema_capsule_name = CString::new("arrow_schema").unwrap();
   
   Python::with_gil(|py| {
       let capsule = PyCapsule::new_with_destructor(
           py,
           boxed_ffi_schema.into_raw(),
           Some(schema_capsule_name),
           destroy_schema_capsule,
       )
       .unwrap();
   });
   ```
   



##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   (this would probably be simpler if there was a `FFI_ArrowSchema::try_import_from` or even a `FFI_ArrowSchema` constructor from a Field)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391286531


##########
arrow/src/pyarrow.rs:
##########
@@ -189,6 +244,29 @@ impl ToPyArrow for Schema {
 
 impl FromPyArrow for ArrayData {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        if value.hasattr("__arrow_c_array__")? {
+            let tuple = value.getattr("__arrow_c_array__")?.call0()?;
+
+            if !tuple.is_instance_of::<PyTuple>() {
+                return Err(PyTypeError::new_err(
+                    "Expected __arrow_c_array__ to return a tuple.",
+                ));
+            }
+
+            let schema_capsule: &PyCapsule = PyTryInto::try_into(tuple.get_item(0)?)?;
+            let array_capsule: &PyCapsule = PyTryInto::try_into(tuple.get_item(1)?)?;
+
+            validate_pycapsule(schema_capsule, "arrow_schema")?;
+            validate_pycapsule(array_capsule, "arrow_array")?;
+
+            let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() };
+            let array_ptr = unsafe { array_capsule.reference::<FFI_ArrowArray>() };
+
+            return ffi::from_ffi(array_ptr.copy(), schema_ptr).map_err(to_py_err);

Review Comment:
   Oh interesting, I didn't know about `std::mem::replace`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391288383


##########
arrow/src/pyarrow.rs:
##########
@@ -118,8 +118,39 @@ fn validate_class(expected: &str, value: &PyAny) -> PyResult<()> {
     Ok(())
 }
 
+fn validate_pycapsule(capsule: &PyCapsule, name: &str) -> PyResult<()> {
+    let capsule_name = capsule.name()?;
+    if capsule_name.is_none() {
+        return Err(PyValueError::new_err(
+            "Expected schema PyCapsule to have name set.",
+        ));
+    }
+
+    let capsule_name = capsule_name.unwrap().to_str()?;
+    if capsule_name != name {
+        return Err(PyValueError::new_err(format!(
+            "Expected name '{}' in PyCapsule.",
+            name,
+        )));
+    }
+
+    Ok(())
+}
+
 impl FromPyArrow for DataType {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.

Review Comment:
   Will do; I had taken this text directly from https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#backwards-compatibility-with-pyarrow ๐Ÿ˜‰ 



##########
arrow/src/pyarrow.rs:
##########
@@ -118,8 +118,39 @@ fn validate_class(expected: &str, value: &PyAny) -> PyResult<()> {
     Ok(())
 }
 
+fn validate_pycapsule(capsule: &PyCapsule, name: &str) -> PyResult<()> {
+    let capsule_name = capsule.name()?;
+    if capsule_name.is_none() {
+        return Err(PyValueError::new_err(
+            "Expected schema PyCapsule to have name set.",
+        ));
+    }
+
+    let capsule_name = capsule_name.unwrap().to_str()?;
+    if capsule_name != name {
+        return Err(PyValueError::new_err(format!(
+            "Expected name '{}' in PyCapsule.",
+            name,
+        )));
+    }
+
+    Ok(())
+}
+
 impl FromPyArrow for DataType {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.

Review Comment:
   Will do; I had taken this text directly from https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html#backwards-compatibility-with-pyarrow ๐Ÿ˜‰ 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391745519


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   I'm not familiar with the pycapsule internals, but one thing I wonder is whether it runs _any_ destructor if you use `PyCapsule::new` instead of `PyCapsule::new_with_destructor`. I.e. do you need even an empty 
   
   ```rs
   fn pycapsule_array_destructor(ffi_array: FFI_ArrowArray, _capsule_context: *mut c_void) {}
   ```
   to get the callback to take ownership again of the `ffi_array` to get rust to drop it? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1812896334

   Three capsules makes sense to me to match tbe corresponding method names. SchemaCapsule, ArrayCapsule and StreamCapsule perhaps
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1812908070

   Do you need the shims in arrow-rs? It might be enough to expose a function that creates and returns the capsules, and then another library that depends on arrow-rs (like geoarrow-rs) could use that to get the capsules which _they_ can expose on their objects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391033847


##########
arrow/src/pyarrow.rs:
##########
@@ -118,8 +118,39 @@ fn validate_class(expected: &str, value: &PyAny) -> PyResult<()> {
     Ok(())
 }
 
+fn validate_pycapsule(capsule: &PyCapsule, name: &str) -> PyResult<()> {
+    let capsule_name = capsule.name()?;
+    if capsule_name.is_none() {
+        return Err(PyValueError::new_err(
+            "Expected schema PyCapsule to have name set.",
+        ));
+    }
+
+    let capsule_name = capsule_name.unwrap().to_str()?;
+    if capsule_name != name {
+        return Err(PyValueError::new_err(format!(
+            "Expected name '{}' in PyCapsule.",
+            name,
+        )));
+    }
+
+    Ok(())
+}
+
 impl FromPyArrow for DataType {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.

Review Comment:
   You probably want to reference https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html in this comment.



##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   You could call this "move" for clarity.



##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   Agreed this doesn't seem useful.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391047696


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   This is avoided by setting `self.release` to None, no?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391026753


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   I feel fairly strongly we should not expose this method publicly, it is very hard to use correctly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1812838217

   > Do you plan to expose the reverse operation, for example a `ToPyCapsule` trait that would allow exporting Rust data to Python without having PyArrow installed?
   
   Unless there are some changes from the second part of https://github.com/apache/arrow/issues/38010, to support this arrow-rs would need to add new Python classes to represent these objects and include the dunder methods, right? Is there appetite to do this, or do people feel that adding new classes would expand the scope of arrow-rs too much?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1394641356


##########
arrow/src/pyarrow.rs:
##########
@@ -166,6 +211,19 @@ impl ToPyArrow for Field {
 
 impl FromPyArrow for Schema {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        // See https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
+        if value.hasattr("__arrow_c_schema__")? {
+            let capsule: &PyCapsule =
+                PyTryInto::try_into(value.getattr("__arrow_c_schema__")?.call0()?)?;
+            validate_pycapsule(capsule, "arrow_schema")?;
+
+            let schema_ptr = unsafe { capsule.reference::<FFI_ArrowSchema>() };

Review Comment:
   Aha, yes, that's true. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391080012


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   https://github.com/apache/arrow-rs/pull/5070/files#r1391079353 is how I would recommend handling this
   
   > The C Data Interface specification calls this "moving"
   
   Much like C++ moving has specific semantics in Rust



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808116860

   Yes, "copy" is a misnomer, that's why I suggested "move" instead as a method name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391489292


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   FWIW drop is actually a no-op. You should be able to just use `PyCapsule::new` and it will work correctly



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391410897


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   Whilst it is never explicitly stated in the C Data interface, I think it is a fairly safe assumption that data can be sent between threads - i.e. it isn't relying on thread-local state or non-atomic reference counts, etc... Happy to be corrected on this front though.
   
   > I'm more confident that is true than Sync.
   
   Sync is definitely a stronger assumption that we don't appear to need to make here, although I also struggle to devise a sane example where it wouldn't hold.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808050320

   @wjones @milesgranger You may want to look at this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391288937


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   Sorry for the confusion about naming that `copy` ๐Ÿ˜… 



##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   Sorry for the confusion about naming that `copy` ๐Ÿ˜… 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1812861225

   >  Is there appetite to do this, or do people feel that adding new classes would expand the scope of arrow-rs too much?
   
   From my understanding these classes would be little more than basic shims, in which case I don't see an issue, but I could be missing something


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1810458773

   I think this just needs some integration tests, and then it should be good to go, thank you all for helping out with this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1813033806

   Let's continue the discussion of exporting to PyCapsules here: https://github.com/apache/arrow-rs/issues/5079


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1810617320

   Thanks! I'll try to get to integration tests tonight


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391392355


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   (this would probably be simpler if there was a `FFI_ArrowSchema::try_import_from` or even a `FFI_ArrowSchema` constructor from a Field)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1812859587

   > I think it would be good to add some test coverage for the pre arrow-14 logic, if that isn't too much effort
   
   I added pyarrow 13 and 14 to the test matrix. So in pyarrow 13, we should see capsule-based tests be skipped but other tests pass.
   
   You might have a better suggestion for structuring that matrix, as it won't upgrade to pyarrow 15 when that's released?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1813071689

   > Unless there are some changes from the second part of [apache/arrow#38010](https://github.com/apache/arrow/issues/38010), to support this arrow-rs would need to add new Python classes to represent these objects and include the dunder methods, right?
   
   You don't need new Python classes, you would just return Python objects.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391027469


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   Are we confident these are actually safe to be added?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391049509


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   Aah yes, in which case the issue is inverted. When the copy goes out of scope the source is left with references to released memory.
   
   Somewhere a reference count needs to be introduced



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391071400


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   Agreed, but that isn't what this method is actually implementing :smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808126358

   Should there be some unit tests for this? You can trivially create objects exposing this protocol using PyArrow as a backend, for example:
   https://github.com/apache/arrow/blob/aa9f8ecdaf3cb8eb2fb478abff32fb96eb350305/python/pyarrow/tests/test_types.py#L1231-L1239
   https://github.com/apache/arrow/blob/aa9f8ecdaf3cb8eb2fb478abff32fb96eb350305/python/pyarrow/tests/test_table.py#L585-L590
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1811784975

   I added some tests. Do you want to test against old versions of pyarrow as well? We're effectively no longer testing `_export_to_c` and `_import_from_c` because the rust code will always prefer the pycapsule methods if they exist. Do you want to add an entry to the matrix to test pyarrow <14 as well?
   
   I had to bump the python version in CI to test with pyarrow 14 at all because pyarrow 14 [requires python >=3.8](https://pypi.org/project/pyarrow/#files).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808102014

   > It does not change anything to the core semantics.
   
   Ok this is what I needed to know, PyCapsule has the same release behaviour as the C Data interface, the copy and custom release behaviour led me to believe it was something reusable.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391358059


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   That code is incorrect, IIUC. `ffi_schema` is a stack-allocated Rust `FFI_ArrowSchema`. You're trying to store its pointer into a `PyCapsule` with Python-driven lifetime. It's a good thing that Rust is preventing it :-)
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391034569


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   This is inherently unsafe, when the source goes out of scope it will invoke release freeing the memory backing these copied arrays



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391280121


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   No, I can't say I understand Send/Sync well enough to be 100% confident this is safe.
   
   The motivation is that right now it's impossible to create a `PyCapsule` instance from an `FFI_ArrowSchema` because it's not `Send`. (See [`PyCapsule::new`](https://docs.rs/pyo3/latest/pyo3/types/struct.PyCapsule.html#method.new)).
   
   ```rs
   let ffi_schema = FFI_ArrowSchema::try_from(&*field).unwrap();
   let schema_capsule_name = CString::new("arrow_schema").unwrap();
   
   Python::with_gil(|py| {
       let capsule = PyCapsule::new(
           py,
           ffi_schema,
           Some(schema_capsule_name),
       )
       .unwrap();
   });
   ```
   
   gives
   
   ```
   error[E0277]: `*const i8` cannot be sent between threads safely
     --> src/array/point.rs:35:17
      |
   33 |             let capsule = PyCapsule::new(
      |                           -------------- required by a bound introduced by this call
   34 |                 py,
   35 |                 ffi_schema,
      |                 ^^^^^^^^^^ `*const i8` cannot be sent between threads safely
      |
      = help: within `arrow::ffi::FFI_ArrowSchema`, the trait `std::marker::Send` is not implemented for `*const i8`
   note: required because it appears within the type `FFI_ArrowSchema`
     --> /Users/kyle/.cargo/git/checkouts/arrow-rs-53e12341924f0a1c/b1c43a4/arrow-schema/src/ffi.rs:71:12
      |
   71 | pub struct FFI_ArrowSchema {
      |            ^^^^^^^^^^^^^^^
   note: required by a bound in `pyo3::types::PyCapsule::new`
     --> /Users/kyle/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.20.0/src/types/capsule.rs:78:29
      |
   78 |     pub fn new<T: 'static + Send + AssertNotZeroSized>(
      |                             ^^^^ required by this bound in `PyCapsule::new`
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808598964

   Unfortunately, looking at the PyO3 PyCapsule-wrapping APIs, this probably can't work, because PyO3 creates its own boxed structure for the capsule pointer:
   https://docs.rs/pyo3/latest/src/pyo3/types/capsule.rs.html#115-119
   
   Non-Rust implementations will expect the capsule pointer to point to a `ArrowSchema` or `ArrowArray` struct, but they will get a pointer to some arbitrary Rust structure instead.
   
   This seems to make PyO3's PyCapsule wrapper generally impractical for communicating with non-Rust runtimes.
   
   @milesgranger @birkenfeld Is my assessment above correct? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391414430


##########
arrow/src/pyarrow.rs:
##########
@@ -262,9 +263,10 @@ impl FromPyArrow for ArrayData {
             validate_pycapsule(array_capsule, "arrow_array")?;
 
             let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() };
-            let array_ptr = unsafe { array_capsule.reference::<FFI_ArrowArray>() };
+            let array_ptr = array_capsule.pointer() as *mut FFI_ArrowArray;
+            let array_mut = unsafe { array_ptr.as_mut() };

Review Comment:
   FWIW you could use https://doc.rust-lang.org/std/ptr/fn.replace.html as we're already playing with pointers here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391434110


##########
arrow/src/pyarrow.rs:
##########
@@ -262,9 +263,10 @@ impl FromPyArrow for ArrayData {
             validate_pycapsule(array_capsule, "arrow_array")?;
 
             let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() };
-            let array_ptr = unsafe { array_capsule.reference::<FFI_ArrowArray>() };
+            let array_ptr = array_capsule.pointer() as *mut FFI_ArrowArray;
+            let array_mut = unsafe { array_ptr.as_mut() };

Review Comment:
   Updated in https://github.com/apache/arrow-rs/pull/5070/commits/e7ed58d1a8283319bad20563ea8fd45ea47b29af; is that what you meant?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391272641


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   The motivation for exporting this was to enable creating new PyCapsules from rust with a destructor that releases the array, but it looks like it should be possible to manually call `drop` without exporting the underlying function:
   
   ```rs
   fn pycapsule_array_destructor(ffi_array: FFI_ArrowArray, _capsule_context: *mut c_void) {
       drop(ffi_array)
   }
   
   let ffi_array = FFI_ArrowArray::new(&self.0.clone().into_array_ref().to_data());
   let array_capsule_name = CString::new("arrow_array").unwrap();
   
   Python::with_gil(|py| {
       let capsule = PyCapsule::new_with_destructor(
           py,
           ffi_array,
           Some(array_capsule_name),
           pycapsule_array_destructor,
       )
       .unwrap();
   });
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391079353


##########
arrow/src/pyarrow.rs:
##########
@@ -189,6 +244,29 @@ impl ToPyArrow for Schema {
 
 impl FromPyArrow for ArrayData {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        if value.hasattr("__arrow_c_array__")? {
+            let tuple = value.getattr("__arrow_c_array__")?.call0()?;
+
+            if !tuple.is_instance_of::<PyTuple>() {
+                return Err(PyTypeError::new_err(
+                    "Expected __arrow_c_array__ to return a tuple.",
+                ));
+            }
+
+            let schema_capsule: &PyCapsule = PyTryInto::try_into(tuple.get_item(0)?)?;
+            let array_capsule: &PyCapsule = PyTryInto::try_into(tuple.get_item(1)?)?;
+
+            validate_pycapsule(schema_capsule, "arrow_schema")?;
+            validate_pycapsule(array_capsule, "arrow_array")?;
+
+            let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() };
+            let array_ptr = unsafe { array_capsule.reference::<FFI_ArrowArray>() };
+
+            return ffi::from_ffi(array_ptr.copy(), schema_ptr).map_err(to_py_err);

Review Comment:
   ```suggestion
               let array = std::mem::replace(array_ptr, FFI_ArrowArray::empty());
               return ffi::from_ffi(array, schema_ptr).map_err(to_py_err);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391080012


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   https://github.com/apache/arrow-rs/pull/5070/files#r1391079353 is how I would recommend handling this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391412900


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {
+        let new = Self {
+            length: self.length,
+            null_count: self.null_count,
+            offset: self.offset,
+            n_buffers: self.n_buffers,
+            n_children: self.n_children,
+            buffers: self.buffers,
+            children: self.children,
+            dictionary: self.dictionary,
+            release: self.release,
+            private_data: self.private_data,
+        };
+        self.release = None;
+        new

Review Comment:
   I removed this method in favor of `std::mem::replace`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808089235

   > Perhaps we could get some of the people who designed PyCapsule to weigh in here, from my reading of it this appears to primarily be an API for python libraries to communicate with each other?
   
   It is designed as a Python-specific protocol (in the Python sense: it uses well-known special methods) around the C Data Interface, and automates conversion between producers and consumers of Arrow data. There is no requirement that the producer or consumer is implemented in Python.
   
   The two main points of this protocol are:
   1) the protocol does not depend on any particular implementation of Arrow; the consumer can consume any PyCapsule provider in an implementation-agnostic manner (it does not need to write any PyArrow-specific code, for example)
   2) the PyCapsule has a destructor that releases the underlying C Data Interface structure if it wasn't moved already; this makes this protocol safer than passing raw pointers as integers (if an exception occurs between exporting and importing, for example, a leak is avoided)
   
   In practice:
   1) provider P exposes a Python object that has a `__arrow_c_schema__` method
   2) consumer C is invoked with said Python object, calls `__arrow_c_schema__` and imports the C Data Interface structure pointed to by the returned PyCapsule; consumer C does know anything about provider P
   
   > I also am very confused by what the ownership semantics of this API are, does the PyCapsule remain the owner?
   
   The PyCapsule just contains a raw pointer. The semantics of the pointer embedded in a PyCapsule are usage-dependent. In this case, the ownership semantics are those of the C Data Interface itself: the contents can be moved by setting the `release` pointer to NULL.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1394087460


##########
arrow/src/pyarrow.rs:
##########
@@ -247,6 +329,40 @@ impl<T: ToPyArrow> ToPyArrow for Vec<T> {
 
 impl FromPyArrow for RecordBatch {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        // See https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
+        if value.hasattr("__arrow_c_array__")? {
+            let tuple = value.getattr("__arrow_c_array__")?.call0()?;
+
+            if !tuple.is_instance_of::<PyTuple>() {
+                return Err(PyTypeError::new_err(
+                    "Expected __arrow_c_array__ to return a tuple.",
+                ));
+            }
+
+            let schema_capsule: &PyCapsule = PyTryInto::try_into(tuple.get_item(0)?)?;
+            let array_capsule: &PyCapsule = PyTryInto::try_into(tuple.get_item(1)?)?;
+
+            validate_pycapsule(schema_capsule, "arrow_schema")?;
+            validate_pycapsule(array_capsule, "arrow_array")?;
+
+            let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() };
+            let array_ptr = array_capsule.pointer() as *mut FFI_ArrowArray;
+            let ffi_array = unsafe { std::ptr::replace(array_ptr, FFI_ArrowArray::empty()) };
+            let array_data = ffi::from_ffi(ffi_array, schema_ptr).map_err(to_py_err)?;
+            let array_ref = make_array(array_data);
+
+            if !matches!(array_ref.data_type(), DataType::Struct(_)) {
+                return Err(PyTypeError::new_err(
+                    "Expected Struct type from __arrow_c_array.",
+                ));
+            }
+
+            let array = array_ref.as_any().downcast_ref::<StructArray>().unwrap();

Review Comment:
   ```suggestion
               if !matches!(array_data.data_type(), DataType::Struct(_)) {
                   return Err(PyTypeError::new_err(
                       "Expected Struct type from __arrow_c_array.",
                   ));
               }
               let array = StructArray::from(array_data);
   ```



##########
arrow/src/pyarrow.rs:
##########
@@ -247,6 +329,40 @@ impl<T: ToPyArrow> ToPyArrow for Vec<T> {
 
 impl FromPyArrow for RecordBatch {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        // See https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
+        if value.hasattr("__arrow_c_array__")? {

Review Comment:
   I think this will allow passing a `StructArray` to a method accepting `PyArrowType<RecordBatch>` provided the `StructArray` isn't nullable. I don't think this is a problem, RecordBatch is a bit of an oddity, but just an observation 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1394459884


##########
arrow/src/pyarrow.rs:
##########
@@ -247,6 +329,40 @@ impl<T: ToPyArrow> ToPyArrow for Vec<T> {
 
 impl FromPyArrow for RecordBatch {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        // See https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
+        if value.hasattr("__arrow_c_array__")? {

Review Comment:
   Yeah... I'm not sure if there's a standard way in Arrow to discern between a StructArray intended to be an array, vs a StructArray intended to represent a RecordBatch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "kylebarron (via GitHub)" <gi...@apache.org>.
kylebarron commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1812871584

   > From my understanding these classes would be little more than basic shims, in which case I don't see an issue, but I could be missing something
   
   At first glance, I thought it would open a can of worms, but maybe not... Would we only have three shims? One per pycapsule type? So exporting a data type/field/schema would all be exported as an `ArrowSchema` class; exporting Array/RecordBatch would be exported as `ArrowArray` and Table/RecordBatchReader would be exported as `ArrowStream`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 merged PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808037347

   Perhaps we could get some of the people who designed PyCapsule to weigh in here, from my reading of it this appears to primarily be an API for python libraries to communicate with each other? I also am very confused by what the ownership semantics of this API are, does the PyCapsule remain the owner? This feels like it can't traverse an FFI interface safely, so I am very confused as to how this is supposed to work...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391077343


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   The C Data Interface specification calls this "moving": https://arrow.apache.org/docs/format/CDataInterface.html#moving-an-array
   
   (from my basic reading that is what is being implemented here, but I am not a rust developer, so I can't say that for sure ;))



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391067046


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {

Review Comment:
   `self.release` being None indicates that the source has been moved or released, so shouldn't be used anymore. This is a general design principle in the C Data Interface:
   https://arrow.apache.org/docs/format/CDataInterface.html#released-structure
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808095923

   No, the _explicit_ contract is that a C Data Interface struct can only be used once :-)
   https://arrow.apache.org/docs/format/CDataInterface.html#released-structure
   
   The PyCapsule is just a Python-safe way of transporting that struct. It does not change anything to the core semantics.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391386468


##########
arrow-data/src/ffi.rs:
##########
@@ -282,6 +286,27 @@ impl FFI_ArrowArray {
         // If dictionary is not null should be valid for reads of `Self`
         unsafe { self.dictionary.as_ref() }
     }
+
+    /// Create a copy of an existing `FFI_ArrowArray`
+    ///
+    /// As required by the C Data Interface specification, this sets the `release` member of `Self`
+    /// to `None`, but without calling the release callback.
+    pub fn copy(&mut self) -> Self {
+        let new = Self {
+            length: self.length,
+            null_count: self.null_count,
+            offset: self.offset,
+            n_buffers: self.n_buffers,
+            n_children: self.n_children,
+            buffers: self.buffers,
+            children: self.children,
+            dictionary: self.dictionary,
+            release: self.release,
+            private_data: self.private_data,
+        };
+        self.release = None;
+        new

Review Comment:
   Do we still use this?
   
   Is this any better than just `std::mem::replace(schema, FFI_ArrowSchema::empty())`?
   



##########
arrow/src/pyarrow.rs:
##########
@@ -118,8 +118,39 @@ fn validate_class(expected: &str, value: &PyAny) -> PyResult<()> {
     Ok(())
 }
 
+fn validate_pycapsule(capsule: &PyCapsule, name: &str) -> PyResult<()> {
+    let capsule_name = capsule.name()?;
+    if capsule_name.is_none() {
+        return Err(PyValueError::new_err(
+            "Expected schema PyCapsule to have name set.",
+        ));
+    }
+
+    let capsule_name = capsule_name.unwrap().to_str()?;
+    if capsule_name != name {
+        return Err(PyValueError::new_err(format!(
+            "Expected name '{}' in PyCapsule.",
+            name,

Review Comment:
   Always nice to have the actual value whenever possible.
   ```suggestion
               "Expected name '{}' in PyCapsule, instead got '{}'",
               name, capsule_name
   ```



##########
arrow/src/pyarrow.rs:
##########


Review Comment:
   Potential refactor for later, but it would be nice to move the Capsule imports to their own functions that aren't named as if to by PyArrow-specific. Right now this module is named for PyArrow, but the import logic will apply to other Arrow implementations as well.



##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   Are we able to just add `Send`. I'm more confident that is true than `Sync`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808093901

   > The PyCapsule just contains a raw pointer. The semantics of the pointer embedded in a PyCapsule are usage-dependent. In this case, the ownership semantics are those of the C Data Interface itself: the contents can be moved by setting the release pointer to NULL.
   
   Ok, so there is an implicit contract that a PyCapsule is never used multiple times?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#issuecomment-1808129103

   > Should there be some unit tests for this
   
   The arrow-pyarrow-integration-testing contains some testing infrastructure for this sort of test


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391390448


##########
arrow-schema/src/ffi.rs:
##########
@@ -351,6 +355,9 @@ impl Drop for FFI_ArrowSchema {
     }
 }
 
+unsafe impl Send for FFI_ArrowSchema {}
+unsafe impl Sync for FFI_ArrowSchema {}

Review Comment:
   Not a Rust programmer, but it seems you should do something like this:
   ```rust
   
   fn destroy_schema_capsule(val: *mut FFI_ArrowSchema, context: *mut c_void) {
       let boxed_schema = unsafe { *Box::from_raw(val) };
       // Will release boxed schema and heap-deallocate it
   }
   
   // Import the field into a heap-allocated FFI_ArrowSchema
   let ffi_schema = FFI_ArrowSchema::try_from(&*field).unwrap();
   let boxed_ffi_schema = Box::<FFI_ArrowSchema>::new();
   std::mem::replace(&*boxed_ffi_schema, ffi_schema);
   
   let schema_capsule_name = CString::new("arrow_schema").unwrap();
   
   Python::with_gil(|py| {
       let capsule = PyCapsule::new_with_destructor(
           py,
           boxed_ffi_schema.into_raw(),
           Some(schema_capsule_name),
           destroy_schema_capsule,
       )
       .unwrap();
   });
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] WIP: Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1391751340


##########
arrow-data/src/ffi.rs:
##########
@@ -66,8 +66,12 @@ impl Drop for FFI_ArrowArray {
 unsafe impl Send for FFI_ArrowArray {}
 unsafe impl Sync for FFI_ArrowArray {}
 
-// callback used to drop [FFI_ArrowArray] when it is exported
-unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {
+/// callback used to drop [FFI_ArrowArray] when it is exported
+///
+/// # Safety
+///
+/// Must be passed a valid [FFI_ArrowArray].
+pub unsafe extern "C" fn release_array(array: *mut FFI_ArrowArray) {

Review Comment:
   > I'm not familiar with the pycapsule internals, but one thing I wonder is whether it runs _any_ destructor if you use `PyCapsule::new` instead of `PyCapsule::new_with_destructor`.
   
   Yes, it seems it does.
   https://github.com/PyO3/pyo3/blob/c8fdb806300ff82747f58bcb9ca08abd3da480b4/src/types/capsule.rs#L273-L282
   and
   https://github.com/PyO3/pyo3/blob/c8fdb806300ff82747f58bcb9ca08abd3da480b4/src/types/capsule.rs#L78-L84



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "wjones127 (via GitHub)" <gi...@apache.org>.
wjones127 commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1394462666


##########
.github/workflows/integration.yml:
##########
@@ -106,6 +106,7 @@ jobs:
     strategy:
       matrix:
         rust: [ stable ]
+        pyarrow: [ "13", "14" ]

Review Comment:
   ๐Ÿ‘ 
   We might also want to leave a comment here so we remember why we test 13 specifically.
   
   ```suggestion
           # PyArrow 13 was the last version prior to introduction to Arrow PyCapsules
           pyarrow: [ "13", "14" ]
   ```



##########
.github/workflows/integration.yml:
##########
@@ -106,6 +106,7 @@ jobs:
     strategy:
       matrix:
         rust: [ stable ]
+        pyarrow: [ "13", "14" ]

Review Comment:
   ๐Ÿ‘ 
   We might also want to leave a comment here so we remember why we test 13 specifically.
   
   ```suggestion
           # PyArrow 13 was the last version prior to introduction to Arrow PyCapsules
           pyarrow: [ "13", "14" ]
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1394631990


##########
arrow/src/pyarrow.rs:
##########
@@ -166,6 +211,19 @@ impl ToPyArrow for Field {
 
 impl FromPyArrow for Schema {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        // See https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
+        if value.hasattr("__arrow_c_schema__")? {
+            let capsule: &PyCapsule =
+                PyTryInto::try_into(value.getattr("__arrow_c_schema__")?.call0()?)?;
+            validate_pycapsule(capsule, "arrow_schema")?;
+
+            let schema_ptr = unsafe { capsule.reference::<FFI_ArrowSchema>() };

Review Comment:
   By the way, I'm curious... does it actually drop the FFI_ArrowSchema so as to call it's release pointer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1394636318


##########
arrow/src/pyarrow.rs:
##########
@@ -166,6 +211,19 @@ impl ToPyArrow for Field {
 
 impl FromPyArrow for Schema {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        // See https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
+        if value.hasattr("__arrow_c_schema__")? {
+            let capsule: &PyCapsule =
+                PyTryInto::try_into(value.getattr("__arrow_c_schema__")?.call0()?)?;
+            validate_pycapsule(capsule, "arrow_schema")?;
+
+            let schema_ptr = unsafe { capsule.reference::<FFI_ArrowSchema>() };

Review Comment:
   Does the pycapsule not handle this, as we don't _move_ the data out of it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Implement Arrow PyCapsule Interface [arrow-rs]

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5070:
URL: https://github.com/apache/arrow-rs/pull/5070#discussion_r1394636318


##########
arrow/src/pyarrow.rs:
##########
@@ -166,6 +211,19 @@ impl ToPyArrow for Field {
 
 impl FromPyArrow for Schema {
     fn from_pyarrow(value: &PyAny) -> PyResult<Self> {
+        // Newer versions of PyArrow as well as other libraries with Arrow data implement this
+        // method, so prefer it over _export_to_c.
+        // See https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html
+        if value.hasattr("__arrow_c_schema__")? {
+            let capsule: &PyCapsule =
+                PyTryInto::try_into(value.getattr("__arrow_c_schema__")?.call0()?)?;
+            validate_pycapsule(capsule, "arrow_schema")?;
+
+            let schema_ptr = unsafe { capsule.reference::<FFI_ArrowSchema>() };

Review Comment:
   Does the pycapsule not handle this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org