You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/05 00:06:51 UTC

[GitHub] [iceberg] szehon-ho commented on pull request #4674: Spark-3.2: Avoid duplicate computation of ALL_MANIFESTS metadata table for spark actions

szehon-ho commented on PR #4674:
URL: https://github.com/apache/iceberg/pull/4674#issuecomment-1118039035

   @RussellSpitzer  pointed me to this, I had a pr is orthogonal to this, to avoid duplicate computation of all_reachable_files here https://github.com/apache/iceberg/pull/3457/files. To me that was the bigger time consumer (exploring all reachable files), though maybe I need to re-do that pr.
   
   Anyway, agree with @RussellSpitzer that maybe cache is a better option than persist (great to see benchmarks for tables with huge snapshots)?  And also it would be better to have it be configurable, and be able to be uncached as soon as possible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org