You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Quanlong Huang (Jira)" <ji...@apache.org> on 2023/04/26 14:07:00 UTC

[jira] [Updated] (IMPALA-12103) InvalidateMetadata not sending back the new db due to catalog update delay

     [ https://issues.apache.org/jira/browse/IMPALA-12103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Quanlong Huang updated IMPALA-12103:
------------------------------------
    Attachment: statestored.INFO

> InvalidateMetadata not sending back the new db due to catalog update delay
> --------------------------------------------------------------------------
>
>                 Key: IMPALA-12103
>                 URL: https://issues.apache.org/jira/browse/IMPALA-12103
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Catalog
>            Reporter: Quanlong Huang
>            Assignee: Quanlong Huang
>            Priority: Major
>         Attachments: catalogd.INFO.bz2, impalad.INFO.bz2, statestored.INFO
>
>
> INVALIDATE METADATA <table> can be executed under an unloaded db. Catalogd will send back the new db and new table in the DDL response.
> If the same command is executed under an loaded db. Catalogd will only send back the new table since it's assumed that impalad already has the db in the local catalog cache. However, this is not always true.
> Catalogd propagates the update by statestore, and impalad applies the updates. There is a delay between the update inside catalogd and inside impalad. So a db can be loaded in catalogd and unloaded in impalad in a short period (could be several seconds).
> This is revealed by a new test added in https://gerrit.cloudera.org/c/19786/2/tests/metadata/test_hms_integration.py
> {code:python}
>   def test_invalidate_metadata(self, unique_name):
>     """Regression test for IMPALA-12082. Verify invalidate metadata on tables under
>     unloaded db won't fail"""
>     db = unique_name + "_db"
>     tbl = db + "." + unique_name + "_tbl"
>     try:
>       self.run_stmt_in_hive("create database " + db)
>       self.run_stmt_in_hive("create table %s (i int)" % tbl)
>       self.client.execute("invalidate metadata %s" % tbl)
>       res = self.client.execute("describe %s" % tbl)
>       assert res.data == ["i\tint\t"]
>     finally:
>       self.run_stmt_in_hive("drop database %s cascade" % db)
> {code}
> The last DESCRIBE statement could fail due to the new db is not loaded in impalad.
> Here are logs from a failed run: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/19176/
> Due to event-processor is enabled by default. The db could be loaded by CREATE_DATABASE event, which happens in this job. In catalogd logs:
> {noformat}
> I0425 03:29:13.910261 13314 MetastoreEvents.java:638] EventId: 25520 EventType: CREATE_DATABASE Creating event 25520 of type CREATE_DATABASE on database test_invalidate_metadata_2864b236_db
> I0425 03:29:13.910507 13314 Db.java:132] createEventId_ for db: test_invalidate_metadata_2864b236_db set to: 25520
> I0425 03:29:13.910528 13314 Db.java:144] lastSyncedEventId_ for db: test_invalidate_metadata_2864b236_db set from -1 to 25520
> I0425 03:29:13.910552 13314 MetastoreEvents.java:627] EventId: 25520 EventType: CREATE_DATABASE Successfully added database test_invalidate_metadata_2864b236_db
> I0425 03:29:14.696295 13730 catalog-server.cc:824] Collected update: 1:DATABASE:test_invalidate_metadata_2864b236_db, version=7951, original size=242, compressed size=175
> I0425 03:29:14.698257 13730 catalog-server.cc:824] Collected update: 1:CATALOG_SERVICE_ID, version=7954, original size=60, compressed size=58
> I0425 03:29:14.698282 13730 JniUtil.java:177] Finished getCatalogDelta request: Getting catalog delta from version 7942. Time spent: 3ms
> {noformat}
> After 03:29:14, the new db is loaded in catalogd and the update is propagates through statestore.
> Impalad executes the INVALIDATE METADATA command at 03:29:16. At this point, the db is loaded in catalogd. So catalogd only sent back the new table in response. In impalad logs:
> {noformat}
> I0425 03:29:16.147104 12074 impala-beeswax-server.cc:58] query(): query=invalidate metadata test_invalidate_metadata_2864b236_db.test_invalidate_metadata_2864b236_tbl
>   01: query (string) = "invalidate metadata test_invalidate_metadata_2864b236_db.test_invalidate_metadata_2864b236_tbl",
> I0425 03:29:16.155800 12074 Frontend.java:1999] 3a42793b188d22df:00f3e98000000000] Analyzing query: invalidate metadata test_invalidate_metadata_2864b236_db.test_invalidate_metadata_2864b236_tbl db: default
> I0425 03:29:16.171465 12662 ImpaladCatalog.java:228] 3a42793b188d22df:00f3e98000000000] Adding: TABLE:test_invalidate_metadata_2864b236_db.test_invalidate_metadata_2864b236_tbl version: 7957 size: 117
> {noformat}
> Note that it only has log about adding the new table, no logs for adding the new db yet.
> Impalad then executed the DESCRIBE command and failed due to the db is unloaded.
> {noformat}
> I0425 03:29:16.208302 12074 impala-beeswax-server.cc:58] query(): query=describe test_invalidate_metadata_2864b236_db.test_invalidate_metadata_2864b236_tbl
> I0425 03:29:16.209887 12074 status.cc:129] 5b49bb885975e25e:aee4766200000000] AnalysisException: Could not resolve path: 'test_invalidate_metadata_2864b236_db.test_invalidate_metadata_2864b236_tbl'
> {noformat}
> The catalog update that contains the new db is applied later at 03:29:18
> {noformat}
> I0425 03:29:18.617198 14806 ImpaladCatalog.java:228] Adding: DATABASE:test_invalidate_metadata_2864b236_db version: 7951 size: 242
> I0425 03:29:18.619530 14806 impala-server.cc:2064] Catalog topic update applied with version: 7954 new min catalog object version: 3
> {noformat}
> So the delay is 4 seconds. This might due to the heavy load when running all the tests.
> A possible solution is impalad adds its latest catalog version into the DDL request. So catalogd knows whether impalad has loaded the new db, and send back the new db if it's unloaded in impalad.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org