You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/02/01 22:26:55 UTC

[GitHub] [arrow] wjones127 commented on issue #33986: [python][rust]Create extension point in python for Dataset/Scanner

wjones127 commented on issue #33986:
URL: https://github.com/apache/arrow/issues/33986#issuecomment-1412824647

   I've had similar challenges with supporting datasets in delta-rs. Another aspect you'll need to think about is supporting Filesystems. In Rust, that means calling into Python functions, which I fear can be sub-optimal because of the GIL. I don't think there's a practical way to directly access the underlying C++ implemented FS unless we made the ABI stable (which I don't see us doing in the foreseeable future).
   
   A route I'm exploring right now is using the ADBC as a stable ABI for pushing down scan queries to storage formats and systems. It probably make more sense for table formats like Delta Lake, which have database-like semantics, than file formats like Lance (which I assume is the projection with the use case you are discussing).
   
   > if possible, make the top level abstraction pure python, so subclasses doesn't need to deal with cython etc if coming from Rust
   
   It's harder, but part of me would prefer a stable C ABI, because it would mean the extension could be used in any language, not just Python.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org