You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "wypoon (via GitHub)" <gi...@apache.org> on 2023/04/29 18:54:30 UTC

[GitHub] [iceberg] wypoon opened a new issue, #7474: Stale data is read in the same session due to CachingCatalog case sensitivity

wypoon opened a new issue, #7474:
URL: https://github.com/apache/iceberg/issues/7474

   ### Feature Request / Improvement
   
   It is known that use of the CachingCatalog can lead to stale data being read from an Iceberg table in one Spark session when the table is updated in another Spark session: https://github.com/apache/iceberg/issues/2319, https://github.com/apache/iceberg/issues/3357. Within the same Spark session, a commit causes the metadata of a cached table to be refreshed, so normally writes should be seen right away by subsequent reads. However, there is a problem even within the same Spark session.
   In a customer SQL workload, we discovered that queries used inconsistent case for database and table names. A table is read using an upper case name and is updated using a lower case name. This is not incorrect as SQL is case insensitive for database, table and column names. This is in the same Spark session. Normally the new snapshot should be read immediately after the write, but it is not, due to a different table being loaded from the cache (two different entries for the table are in the cache, under different keys). As a result, stale data is read until the cache expiration occurs. (Due to repeated reads, the cache keeps getting renewed, exacerbating the problem.)
   I opened https://github.com/apache/iceberg/pull/7469 to address this problem by providing a conf to control the case sensitivity of the CachingCatalog.
   
   ### Query engine
   
   Spark


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho closed issue #7474: Stale data is read in the same session due to CachingCatalog case sensitivity

Posted by "szehon-ho (via GitHub)" <gi...@apache.org>.
szehon-ho closed issue #7474: Stale data is read in the same session due to CachingCatalog case sensitivity
URL: https://github.com/apache/iceberg/issues/7474


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org