You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/15 19:58:27 UTC

[GitHub] [iceberg] rdblue commented on issue #2319: Caching Tables in SparkCatalog via CachingCatalog by default leads to stale data

rdblue commented on issue #2319:
URL: https://github.com/apache/iceberg/issues/2319#issuecomment-799710932


   Originally, we only cached tables for a short period of time, since the expectation is that we want to have fresh data if it is available, but have a reasonable window where results don't just change (i.e., while planning a self-join). The problem with this was that tables that were referenced by Spark plans would not get refreshed properly because the table would become an orphaned object and no other operations would call `refresh` after an update.
   
   There are a few recent changes that make this less of a problem. First, Spark should now correctly invalidate cached plans, as should the Spark actions in Iceberg. I think that solving the problem the right way changes the trade-off and that we should re-introduce the table expiration.
   
   Does that sound reasonable to everyone? @aokolnychyi, @pvary, @edgarRd?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org