You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2018/09/05 22:53:00 UTC

[jira] [Commented] (IMPALA-7469) Invalidate local catalog cache based on topic updates

    [ https://issues.apache.org/jira/browse/IMPALA-7469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16605041#comment-16605041 ] 

ASF subversion and git services commented on IMPALA-7469:
---------------------------------------------------------

Commit 8dcf54aee2b396378aba1cc84426b91d4976eef8 in impala's branch refs/heads/master from [~tlipcon]
[ https://git-wip-us.apache.org/repos/asf?p=impala.git;h=8dcf54a ]

IMPALA-7469. Invalidate LocalCatalog cache based on topic updates

This implements cache invalidation inside CatalogdMetaProvider. The
design is as follows:

- when the catalogd collects updates into the statestore topic, it now
  adds an additional entry for each table and database. These additional
  entries are minimal - they only include the object's name, but no
  metadata. This new behavior is conditional on a new flag
  --catalog_topic_mode. The default mode is to keep the old style, but
  it can be configured to mixed (support both v1 and v2) or v2-only.

- the old-style topic entries are prefixed with a '1:' whereas the new
  minimal entries are prefixed with a '2:'. The impalad will subscribe
  to one or the other prefix depending on whether it is running with
  --use_local_catalog. Thus, old impalads will not be confused by the
  new entries and vice versa.

- when the impalad gets these topic updates, it forwards them through to
  the catalog implementation. The LocalCatalog implementation forwards
  them to the CatalogdMetaProvider, which uses them to invalidate
  cached metadata as appropriate.

This patch includes some basic unit tests. I also did some manual
testing by connecting to different impalads and verifying that a session
connected to impalad #1 saw the effects of DDLs made by impalad #2
within a short period of time (the statestore topic update frequency).

Existing end-to-end tests cover these code paths pretty thoroughly:

- if we didn't automatically invalidate the cache on a coordinator
  in response to DDL operations, then any test which expects to
  "read its own writes" (eg access a table after creating one)
  would fail
- if we didn't propagate invalidations via the statestore, then
  all of the tests that use sync_ddl would fail.

I verified the test coverage above using some of the tests in
test_ddl.py -- I selectively commented out a few of the invalidation
code paths in the new code and verified that tests failed until I
re-introduced them. Along the way I also improved test_ddl so that, when
this code is broken, it properly fails with a timeout. It also has a bit
of expanded coverage for both the SYNC_DDL and non-SYNC cases.

I also wrote a new custom-cluster test for LocalCatalog that verifies
a few of the specific edge cases like detecting catalogd restart,
SYNC_DDL behavior in mixed mode, etc.

One notable exception here is the implementation of INVALIDATE METADATA
This turned out to be complex to implement, so I left a lengthy TODO
describing the issue and filed a JIRA.

Change-Id: I615f9e6bd167b36cd8d93da59426dd6813ae4984
Reviewed-on: http://gerrit.cloudera.org:8080/11280
Reviewed-by: Todd Lipcon <to...@apache.org>
Tested-by: Todd Lipcon <to...@apache.org>


> Invalidate local catalog cache based on topic updates
> -----------------------------------------------------
>
>                 Key: IMPALA-7469
>                 URL: https://issues.apache.org/jira/browse/IMPALA-7469
>             Project: IMPALA
>          Issue Type: Sub-task
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>            Priority: Major
>
> When another impalad modifies the state of a catalog object in the statestore, it needs to propagate that invalidation to impalads that might have cached some information about that object. We'll use the catalog statestore topic to do this propagation so that existing code paths around SYNC_DDL can still be supported.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org