You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "Fokko (via GitHub)" <gi...@apache.org> on 2023/04/05 08:51:37 UTC

[GitHub] [iceberg] Fokko commented on issue #7067: Polars Based Compute Engine

Fokko commented on issue #7067:
URL: https://github.com/apache/iceberg/issues/7067#issuecomment-1497141806

   Thanks @chitralverma for chiming in here.
   
   > I was looking to do this integration over the weekend. It will be a quick addition because py-iceberg already allows a table to be converted to a pyarrow table which can be fed to Polars' eager read API. No need to rely on to_pandas which may incur additional overhead.
   
   That sounds like a great first step. The important part is that we push down the predicate from Polars into PyIceberg. Iceberg is designed to work with large tables, and not being able to prune files would result in very poor performance.
   
   > However, it would be great to support the lazy scan API as well, because most internal optimisation take place over there.
   
   I fully agree. I think that would be a great second step, but would probably be a bit more complex. We don't integrate in the way with arrow that would be ideal, but we're working on this (probably would take some time). This would require when an action is being done on a dataset, it would need to call pyiceberg to do the planning (and do all the Iceberg optimizations).
   
   I'm happy to help, but I'm less familiar with Polars, so it would be awesome if you could work on the integration on that side 🚀 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org