You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/09/22 16:35:06 UTC

[GitHub] [iceberg] RussellSpitzer commented on issue #1485: Reconsider caching behavior in Spark 3

RussellSpitzer commented on issue #1485:
URL: https://github.com/apache/iceberg/issues/1485#issuecomment-696837537

I actually would have assumed it was a bug if a "Cache" command was invalidated by another table operation, in my mind it should snapshot the table state at that moment. I know because the behavior is lazy in Spark your guarantees on "when" are a bit more iffy, but I think the Spark cache shouldn't be automatically invalidated.

One of my main motivators here is that you could modify this table in a non spark framework and you wouldn't even know that happened inside Spark. For example say I have both Presto and Spark users, why should a Spark user's actions invalidate the cache when a Presto User's would not? Now I have a belief that my actions will always invalidate the cache, but there is a set of changes that would not. I would think it's better to assume "Cache" gives you a snapshot which will not change unless specifically asked for.

The second case seems more clear to me, we definitely should be refreshing there.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org