You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "Fokko (via GitHub)" <gi...@apache.org> on 2023/05/15 14:59:39 UTC

[GitHub] [iceberg] Fokko commented on issue #7598: Expose PyIceberg table as PyArrow Dataset

Fokko commented on issue #7598:
URL: https://github.com/apache/iceberg/issues/7598#issuecomment-1548031781

   @wjones127 Thanks for raising this and doing all the work. I've added some comments to the Google doc and also the pull request that describes the interface.
   
   @corleyma has a good point here. I think the main reason why Arrow doesn't have an Iceberg implementation today is that it is quite a lot of work to get the details right. And the details make Iceberg so performant.
   
   As https://github.com/apache/arrow/issues/33986 suggest I think it would be great for PyIceberg to be able to produce and consume substrait plans. It could consume a light-level plan `SELECT * FROM s3://bucket/path/json@snapshot-id WHERE date_field => 2023-01-01` and produce a low-level plan where it would tell Arrow which files to read, and what kind of projection needs to be done. It will become complex though, for example, how would we express [positional deletes](https://github.com/apache/iceberg/pull/6775)? It can be done but would need some changes to substrait I assume.
   
   That said, I'm all in to see if we can integrate PyIceberg into Arrow. I agree that the dataset is the ideal situation. If there is anything that you want me to try, please let me know, I'm happy to help and see if we can make this work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org