You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "wjones127 (via GitHub)" <gi...@apache.org> on 2023/06/22 04:45:16 UTC

[GitHub] [arrow-datafusion-python] wjones127 commented on issue #414: Show documentation how to use Delta table

wjones127 commented on issue #414:
URL: https://github.com/apache/arrow-datafusion-python/issues/414#issuecomment-1602003194

   > I think Delta rust is using Datafusion internally
   
   There's three senses in which we integrate with DataFusion:
   
   1. We use DataFusion components inside of our own functions
   2. We have a plugin for Rust DataFusion, but that can only be used from Rust
   3. We can export PyArrow datasets, which datafusion-python can read.
   
   It's only the third one that applies to this library.
   
   > I could not find any documentation though how to use Delta table with Python datafusion
   
   Our integration with the Python DataFusion is similar to DuckDB: create a PyArrow dataset, import that into DataFusion, and query as desired.
   
   ```python
   from datafusion import SessionContext
   from deltalake import DeltaTable
   
   # Create a DataFusion context
   ctx = SessionContext()
   delta_table = DeltaTable("path/to/your/table")
   ctx.register_dataset(delta_table.to_pyarrow_dataset(), table_name="my_table")
   
   df = ctx.sql("SELECT * FROM my_table")
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org