You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "changhiskhan (via GitHub)" <gi...@apache.org> on 2023/02/09 19:57:21 UTC

[GitHub] [arrow] changhiskhan commented on issue #33986: [Python][Rust] Create extension point in python for Dataset/Scanner

changhiskhan commented on issue #33986:
URL: https://github.com/apache/arrow/issues/33986#issuecomment-1424733395

   > The toplevel proposal sounds like sidestepping that entirely by introducing a separate abstraction layer at the Python level (hence, exposing ABCs in Python).
   
   Yup, that's exactly the proposal here.
   
   > I don't think they had a choice, because there's not really a formal API for what they really want :)
   
   The main blocker in the current version of DuckDB is using the static methods in `Scanner.from_dataset` and similar. I made a PR to change that to use the instance method `dataset.to_scanner`. So next release it will be *possible* for Rust packages to disguise themselves as pyarrow datasets to DuckDB.
   
   The issue is that you have to be really careful to override all of the methods in Dataset, or else it'll try to unwrap the non-existent CDataset and crash python. This is the main motivation for me proposing a pure python abstraction on top.
   
   
   Would y'all be open to accepting a PR for this? or is there a more formal process to propose some details?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org