You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/05/19 02:20:12 UTC

[GitHub] [iceberg] kbendick commented on pull request #4736: WIP: Improve performance of expire snapshot by not double-scanning non-expired manifests

kbendick commented on PR #4736:
URL: https://github.com/apache/iceberg/pull/4736#issuecomment-1131020429

   Thanks for tagging me Anton.
   
   I think this is a good idea as well.
   
   I won’t repeat others but I have some questions and possibly a few additional ideas but I need to go through some theoretical cases on paper first.
   
   > One potential problem with that is that we will load the manifest list for every expired snapshot on the driver, which can become a bottleneck if we expire a lot of snapshots. I've seen such cases.
   
   Two thoughts:
   
   1) We should add an event / metric describing this replanning work. Could be used as a signal to perform table maintenance.
   2) We might be able to track a metric to determine if we should do this initial replanning work in a distributed manner.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org