You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "robtandy (via GitHub)" <gi...@apache.org> on 2023/12/06 22:38:19 UTC

Re: [I] [Python][FlightRPC] Is it possible to use pyarrow.dataset as an abstraction over arrow-flight data? [arrow]

robtandy commented on issue #37278:
URL: https://github.com/apache/arrow/issues/37278#issuecomment-1843801054

   @lidavidm Thanks for responding, I think i have the same need as @balshetzer and am not sure if I understand the tooling perfectly. 
   
   I would like to be able to use DuckDB to query across older data (held in parquet files) as well as newer data, held in apache arrow Tables hosted on a fleet of Arrow Flight serving machines.
   
   My understanding of this, from https://duckdb.org/docs/guides/python/sql_on_arrow, and from experimentation is that, i need to construct a Dataset that would have Fragments representing parquet files, but also representing RecordBatchReaders returned from Arrow Flight.
   
   AFAICT this isn't possible at the moment, at least with pyarrow.  Datasets do not allow creation from a record batch reader.
   
   If i understand correctly, FlightSQL and ADBC facilitate communication between the database and clients.  The dataset helps create an abstraction over the source data for the database itself.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org