You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/03 06:18:49 UTC

[GitHub] [arrow-rs] hu6360567 opened a new issue #994: Confused import/export API when interacting with C++

hu6360567 opened a new issue #994:
URL: https://github.com/apache/arrow-rs/issues/994


   **Which part is this question about**
   FFI apis and raw pointer management
   
   **Describe your question**
   I'm trying to import/export arrow arrays between C++ and Rust, when it crashes when C++ is compiled in release mode.
   ref: https://github.com/apache/arrow/issues/11846
   
   To be confused, when import/expore array/schema on Rust side, the input pointer is `const`.
   However, C++ API`arrow::ImportRecordBatch` moves the payload from input array/schema pointers.
   Is there any code to show best practice on import/export between Rust and C++?
   
   My implementation of exporting arrow array allocated from Rust
   Rust part:
   ```rust
   fn export<T>(array: T, content: *mut *const FFI_ArrowArray, schema: *mut *const FFI_ArrowSchema) where T: Array {
       match array.to_raw() {
           Ok((c, s)) => unsafe {
               // Is it right to directly copy pointer address, when does `c` and `s` actually dropped?
               // Should I use `copy_to` or `replace` from `std::mem::ptr` ?
               *content = c;
               *schema = s;
           }
           Err(e) => {
               eprintln!("{}", e);
           }
       }
   }
   
   #[no_mangle]
   pub extern "C" fn export_array(content: *mut *const FFI_ArrowArray, schema: *mut *const FFI_ArrowSchema) {
       let sa = StringArray::from(vec!["a", "b", "c"]);
   
       export(sa, content, schema);
   }
   ```
   C++ part
   ```C++
   void func(){
       const ArrowArray *content = nullptr;
       const ArrowSchema *schema = nullptr;
       export_array(&content, &schema);
   
       // I have to const_cast the pointers
       auto array = *arrow::ImportRecordBatch(const_cast<ArrowArray *>(content), const_cast<ArrowSchema *>(schema));
       std::cout << array->ToString() << std::endl;
   }
   ```
   
   
   **Additional context**
   ref: https://github.com/apache/arrow/issues/11846
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 edited a comment on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 edited a comment on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet for import/export with FFI.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L149
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step
   
   
   For safety notice of ArrowArray,
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/ffi.rs#L638-L643
   
   But, for the workflow of exporting to FFI, I didn't find when to release pointers created during `into_raw`.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L136
   C++ `arrow::ImportArray` moves the payload of pointer, but cannot release the pointer, since it is allocated by `Arc` in rust.
   Does it lead to a memory leak?
   
   
   My proposal:
   ```rust 
   pub fn export_array<T>(array: &T, content: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) -> Result<()> where T: Array {
       if content.is_null() { Err(ArrowError::MemoryError("content is null".to_string()))? }
       if schema.is_null() { Err(ArrowError::MemoryError("schema is null".to_string()))? }
       let (content_ptr, schema_ptr) = array.to_raw().unwrap();
   
       // swap content/content_ptr, schema/schema_ptr
       unsafe {
           content.swap(content_ptr as *mut FFI_ArrowArray);
           schema.swap(schema_ptr as *mut FFI_ArrowSchema);
       }
   
       // release content_ptr/schema_ptr
       unsafe {
           let _ = ArrowArray::try_from_raw(content_ptr, schema_ptr);
       }
   
       Ok(())
   }
   
   pub fn import_array(content: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) -> Result<ArrayRef> {
       let empty_array = unsafe { ArrowArray::empty() };
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(empty_array);
   
       unsafe {
           content.swap(content_ptr as *mut FFI_ArrowArray);
           schema.swap(schema_ptr as *mut FFI_ArrowSchema);
       }
   
       unsafe { make_array_from_raw(content_ptr, schema_ptr) }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 edited a comment on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 edited a comment on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet for import/export with FFI.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L149
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step
   
   
   For safety notice of ArrowArray,
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/ffi.rs#L638-L643
   
   But, for the workflow of exporting to FFI, I didn't find when to release pointers created during `into_raw`.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L136
   C++ `arrow::ImportArray` moves the payload of pointer, but cannot release the pointer, since it is allocated by `Arc` in rust.
   Does it lead to a memory leak?
   
   
   My proposal:
   ```rust 
   pub type Result<T> = std::result::Result<T, error::Error>;
   
   
   pub(crate) fn export_array(array: ArrowArray, content: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) -> Result<()> {
       if content.is_null() { Err(ArrowError::MemoryError("content is null".to_string()))? }
       if schema.is_null() { Err(ArrowError::MemoryError("schema is null".to_string()))? }
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(array);
   
       // swap content/content_ptr, schema/schema_ptr, like C++ std::unique_ptr 
       unsafe {
           content.swap(content_ptr as *mut FFI_ArrowArray);
           schema.swap(schema_ptr as *mut FFI_ArrowSchema);
       }
   
       // release content_ptr/schema_ptr
       unsafe {
           let _ = ArrowArray::try_from_raw(content_ptr, schema_ptr);
       }
   
       Ok(())
   }
   
   pub(crate) fn import_array(content: *const FFI_ArrowArray, schema: *const FFI_ArrowSchema) -> Result<ArrowArray> {
       let empty_array = unsafe { ArrowArray::empty() };
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(empty_array);
   
       unsafe {
           content.copy_to(content_ptr as *mut FFI_ArrowArray, 1);
           schema.copy_to(schema_ptr as *mut FFI_ArrowSchema, 1);
       }
   
       unsafe { Ok(ArrowArray::try_from_raw(content_ptr, schema_ptr)?) }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 commented on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 commented on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L132
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 edited a comment on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 edited a comment on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet for import/export with FFI.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L149
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step
   
   
   For safety notice of ArrowArray,
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/ffi.rs#L638-L643
   
   But, for the workflow of exporting to FFI, I didn't find when to release pointers created during `into_raw`.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L136
   C++ `arrow::ImportArray` moves the payload of pointer, but cannot release the pointer, since it is allocated by `Arc` in rust.
   Is it lead to a memory leak?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 edited a comment on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 edited a comment on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet for import/export with FFI.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L149
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step
   
   
   For safety notice of ArrowArray,
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/ffi.rs#L638-L643
   
   But, for the workflow of exporting to FFI, I didn't find when to release pointers created during `into_raw`.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L136
   C++ `arrow::ImportArray` moves the payload of pointer, but cannot release the pointer, since it is allocated by `Arc` in rust.
   Does it lead to a memory leak?
   
   
   My proposal:
   ```rust 
   pub type Result<T> = std::result::Result<T, error::Error>;
   
   
   pub(crate) fn export_array(array: ArrowArray, content: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) -> Result<()> {
       if content.is_null() { Err(ArrowError::MemoryError("content is null".to_string()))? }
       if schema.is_null() { Err(ArrowError::MemoryError("schema is null".to_string()))? }
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(array);
   
       // write to mutable content/schema
       unsafe {
           content_ptr.copy_to(content, 1);
           schema_ptr.copy_to(schema, 1);
       }
   
       // release content_ptr/schema_ptr
       unsafe {
           let _ = ArrowArray::try_from_raw(content_ptr, schema_ptr);
       }
   
       Ok(())
   }
   
   pub(crate) fn import_array(content: *const FFI_ArrowArray, schema: *const FFI_ArrowSchema) -> Result<ArrowArray> {
       let empty_array = unsafe { ArrowArray::empty() };
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(empty_array);
   
       unsafe {
           content.copy_to(content_ptr as *mut FFI_ArrowArray, 1);
           schema.copy_to(schema_ptr as *mut FFI_ArrowSchema, 1);
       }
   
       unsafe { Ok(ArrowArray::try_from_raw(content_ptr, schema_ptr)?) }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 edited a comment on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 edited a comment on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet for import/export with FFI.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L149
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step
   
   
   For safety notice of ArrowArray,
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/ffi.rs#L638-L643
   
   But, for the workflow of exporting to FFI, I didn't find when to release pointers created during `into_raw`.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L136
   C++ `arrow::ImportArray` moves the payload of pointer, but cannot release the pointer, since it is allocated by `Arc` in rust.
   Does it lead to a memory leak?
   
   
   My proposal:
   ```rust 
   pub(crate) fn export_array<T>(array: T, content: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) -> Result<()> where T: Array {
       if content.is_null() { Err(ArrowError::MemoryError("content is null".to_string()))? }
       if schema.is_null() { Err(ArrowError::MemoryError("schema is null".to_string()))? }
       let (content_ptr, schema_ptr) = array.to_raw().unwrap();
       println!("content{:p}, content_ptr{:p}, schema{:p}, schema_ptr{:p}", content, content_ptr, schema, schema_ptr);
   
       // swap content/content_ptr, schema/schema_ptr
       unsafe {
           content.swap(content_ptr as *mut FFI_ArrowArray);
           schema.swap(schema_ptr as *mut FFI_ArrowSchema);
       }
   
       // release content_ptr/schema_ptr
       unsafe {
           let _ = ArrowArray::try_from_raw(content_ptr, schema_ptr);
       }
   
       Ok(())
   }
   
   pub(crate) fn import_array(content: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) -> Result<ArrowArray> {
       let empty_array = unsafe { ArrowArray::empty() };
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(empty_array);
   
       unsafe {
           content.swap(content_ptr as *mut FFI_ArrowArray);
           schema.swap(schema_ptr as *mut FFI_ArrowSchema);
       }
   
       unsafe { Ok(ArrowArray::try_from_raw(content_ptr, schema_ptr)?) }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 edited a comment on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 edited a comment on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet for import/export with FFI.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L149
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step
   
   
   For safety notice of ArrowArray,
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/ffi.rs#L638-L643
   
   But, for the workflow of exporting to FFI, I didn't find when to release pointers created during `into_raw`.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L136
   `arrow::ImportArray` moves the payload of pointer, but cannot release the pointer, since it is allocated by `Arc` in rust.
   Is it lead to a memory leak?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] hu6360567 edited a comment on issue #994: Confused import/export API when interacting with C++

Posted by GitBox <gi...@apache.org>.
hu6360567 edited a comment on issue #994:
URL: https://github.com/apache/arrow-rs/issues/994#issuecomment-985254643


   In `pyarrow.rs`, I found some snippet for import/export with FFI.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L110-L149
   
   The desired workflow of importing array from FFI:
   1. prepare ArrowArray and leak both pointers to FFI
   2. write C Data Interface into both pointers
   3. Import from both pointers, allocated in first step
   
   
   For safety notice of ArrowArray,
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/ffi.rs#L638-L643
   
   But, for the workflow of exporting to FFI, I didn't find when to release pointers created during `into_raw`.
   https://github.com/apache/arrow-rs/blob/e9be49d962560ce5b87544a2933d8b207322cf60/arrow/src/pyarrow.rs#L136
   C++ `arrow::ImportArray` moves the payload of pointer, but cannot release the pointer, since it is allocated by `Arc` in rust.
   Is it lead to a memory leak?
   
   
   My proposal:
   ```rust 
   pub type Result<T> = std::result::Result<T, error::Error>;
   
   
   pub(crate) fn export_array(array: ArrowArray, content: *mut FFI_ArrowArray, schema: *mut FFI_ArrowSchema) -> Result<()> {
       if content.is_null() { Err(ArrowError::MemoryError("content is null".to_string()))? }
       if schema.is_null() { Err(ArrowError::MemoryError("schema is null".to_string()))? }
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(array);
   
       // write to mutable content/schema
       unsafe {
           content_ptr.copy_to(content, 1);
           schema_ptr.copy_to(schema, 1);
       }
   
       // release content_ptr/schema_ptr
       unsafe {
           let _ = ArrowArray::try_from_raw(content_ptr, schema_ptr);
       }
   
       Ok(())
   }
   
   pub(crate) fn import_array(content: *const FFI_ArrowArray, schema: *const FFI_ArrowSchema) -> Result<ArrowArray> {
       let empty_array = unsafe { ArrowArray::empty() };
       let (content_ptr, schema_ptr) = ArrowArray::into_raw(empty_array);
   
       unsafe {
           content.copy_to(content_ptr as *mut FFI_ArrowArray, 1);
           schema.copy_to(schema_ptr as *mut FFI_ArrowSchema, 1);
       }
   
       unsafe { Ok(ArrowArray::try_from_raw(content_ptr, schema_ptr)?) }
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org