You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2019/06/17 23:17:00 UTC

[jira] [Commented] (IMPALA-7534) Handle invalidation races in CatalogdMetaProvider cache

    [ https://issues.apache.org/jira/browse/IMPALA-7534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866066#comment-16866066 ] 

Todd Lipcon commented on IMPALA-7534:
-------------------------------------

Reading back over Paul's analysis here, I think the missing link is that the version-numbered cache keys are used for individual objects, but not the higher levels in the hierarchy (like table name list and the top-level table object). So, this can cause issues like IMPALA-8567 as described above. Assuming a starting state where the table name list is not cached:

- Impalad: some select query, which calls loadTableNames(), and sends a request to the catlaog
- Catalog: returns a list of tables ['foo'], but the response is still in-flight
- Catalog: someone issues a DDL which creates a table 'bar'. Issues an invalidate to all impalads
- Impalad: the loadTableNames() call is still in flight, but receives the invalidation via a different thread. The invalidation sees nothing is in the cache, so it is ignored.
- Impalad: the loadTableNames() query completes, and the table list ['foo'] is cached

This leaves the impalad cache in a persistent incorrect state. New calls to loadTableNames() get a cache hit with the incorrect value.

In order to fix this, as discussed in the linked articles, we have a few choices:
(1) invalidate can block on any outstanding "loadWithCaching" for the same key, and invalidate it after it gets stored in the cache
(2) invalidate can prevent any outstanding "loadWithCaching" from writing back its result

Choice 2 is better to avoid blocking between potentially-unrelated operations.

> Handle invalidation races in CatalogdMetaProvider cache
> -------------------------------------------------------
>
>                 Key: IMPALA-7534
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7534
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>             Fix For: Not Applicable
>
>
> There is a well-known race in Guava's LoadingCache that we are using for CatalogdMetaProvider which we are not currently handling:
> - thread 1 gets a cache miss and makes a request to fetch some data from the catalogd. It fetches the catalog object with version 1 and then gets context switched out or otherwise slow
> - thread 2 receives an invalidation for the same object, because it has changed to v2. It calls 'invalidate' on the cache, but nothing is yet cached.
> - thread 1 puts back v1 of the object into the cache
> In essence we've "missed" an invalidation. This is also described in this nice post: https://softwaremill.com/race-condition-cache-guava-caffeine/
> The race is quite unlikely but could cause some unexpected results that are hard to reason about, so we should look into a fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org