You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/05/24 13:29:37 UTC

[GitHub] [arrow-adbc] jorisvandenbossche commented on pull request #702: feat(python/adbc_driver_manager): experiment with using PyCapsules

jorisvandenbossche commented on PR #702:
URL: https://github.com/apache/arrow-adbc/pull/702#issuecomment-1561157777

   Small showcase:
   
   ```python
   import adbc_driver_sqlite.dbapi
   
   conn = adbc_driver_sqlite.dbapi.connect()
   cursor = conn.cursor()
   
   # using those private methods for now, to get a handle object
   # (instead of already a pyarrow object)
   cursor._prepare_execute("SELECT 1 as a, 2.0 as b, 'Hello, world!' as c")
   handle, _ = cursor._stmt.execute_query()
   
   # manually getting the capsule and passing it to pyarrow for now
   capsule = handle._to_capsule()
   pa.RecordBatchReader._import_from_c_capsule(capsule).read_all()
   # pyarrow.Table
   # a: int64
   # b: double
   # c: string
   # ----
   # a: [[1]]
   # b: [[2]]
   # c: [["Hello, world!"]]
   
   # trying to import it a second time raises an error
   pa.RecordBatchReader._import_from_c_capsule(capsule).read_all()
   # ...
   # ArrowInvalid: Cannot import released ArrowArrayStream
   
   # when the capsule object gets deleted/collected -> release callback is not called
   # because it was already consumed
   del capsule
   
   # but when the stream was not consumed, the capsule deleter will call the release callback
   cursor._prepare_execute("SELECT 1 as a, 2.0 as b, 'Hello, world!' as c")
   handle, _ = cursor._stmt.execute_query()
   capsule = handle._to_capsule()
   del capsule
   # calling the release
   ```
   
   Some design questions about this for the adbc manager side:
   
   - Currently the DBAPI methods like `execute(..)` already initializes the pyarrow RecordBatchReader. We might want to delay that creation until actually `fetch_arrow_table()` gets called? (or one of the other fetch variants that and up consuming the RecordBatchReader as well) 
     And then we could for example have a `fetch_arrow_stream()` method that gives you some custom object that then has the appropriate protocol method like `__arrow_c_stream__` (instead of the current dummy `_to_capsule()`)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org