You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/06 16:42:50 UTC

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8287: ARROW-10111: [Rust] Added new crate with code that consumes C Data interface

jorgecarleitao edited a comment on pull request #8287:
URL: https://github.com/apache/arrow/pull/8287#issuecomment-704388901


   I have been heavily working in this problem based on your ideas, @pitrou on a [separate branch](https://github.com/jorgecarleitao/arrow/pull/13/files), and I think I need some input.
   
   That code is still a mess, as I am still in design/experimentation phase. What it can do so far:
   
   1. import an array from Python and perform arbitrary operations on it
   2. export an array to Python and perform operations on it (from Python) ...
   
   Step 2 causes a double free and crashes when Python releases the resource. I know why and I am working on it. While working on it, I found the catch, which I would welcome very much your input.
   
   Currently, in Rust, two distinct arrays can share a buffer via an (atomically counted) shared pointer, `Arc`.
   
   Say we have two arrays `A` and `B` that share a buffer. When we export array `A`, I think that our release cannot just `free` the buffer: any ref-counts will be ignored and we may end up with a dangling pointer at `B`. Instead, it seems that we need to keep track of the refcounts.
   
   In this direction, exporting an array (without children for for now) is equivalent to increase the ref count by 1, and releasing the exported array is equivalent to decrease it by 1. Specifically, exporting an array consists of
   
   1. for each buffer in the array, manually increase its `Arc`'s (strong) refcounts by 1
   2. store the memory location of each of the `Arc` in private data
   3. build the ABI struct with the private data and expose the pointer to Python/whatever
   
   This is artificially stating that our struct now also shares read ownership over that data. Because the refcount was increased by 1, rust won't free the resources automatically.
   
   Releasing an Array consists of:
   
   1. read private data and interpret parts of it as `Arc`s
   2. reduce (strong) refcount of each Arc by 1
   
   Does this make any sense?
   
   Btw, is this what it is meant [in this section of the C Data interface](https://arrow.apache.org/docs/format/CDataInterface.html#release-callback-semantics-for-producers) wrt to `shared_ptr`s?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org