You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org> on 2023/10/30 23:55:59 UTC

[Impala-ASF-CR] IMPALA-10976: Sync db/table in catalogD to latest HMS event id after the DDL operation for all DDLS from Impala clients

Hello Quanlong Huang, k.venureddy2103@gmail.com, Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/20367

to look at the new patch set (#5).

Change subject: IMPALA-10976: Sync db/table in catalogD to latest HMS event id after the DDL operation for all DDLS from Impala clients
......................................................................

IMPALA-10976: Sync db/table in catalogD to latest
HMS event id after the DDL operation for all DDLS
from Impala clients

The idea is that when any DDL operation is performed by Impala, it also
syncs the db/table to its latest event ID as per HMS. This way updates
to a db/table's are applied in the same order as they appear in the
Notification log table in HMS which ensures consistency. Currently
catalogD applies any updates received from Impala clients in-place.
Instead it should perform an HMS operation first and then replay all
the HMS events since the last synced event id.

Implementation: when the enable_sync_to_latest_event_on_ddls flag is
set to true, we do the DDL operation first, i.e., perform HMS operation
and then sync the db/table in the catalogD's cache to the latest event
in HMS for the corresponding db/table. Currently we fetch all events
greater than the db/table's lastSyncEventId and filter them in the
events processor to sync only the current db/table events. Once
HIVE-27499 is implemented, we can directly fetch the events only for
the respective db/table and process them. Currently, there is no
efficient way to identify if there are pending events for a db/table.

Set 'enable_sync_to_latest_event_on_ddls' to true and
'invalidate_hms_cache_on_ddls' to false to use this feature.

Note: We don't modify the cache using MetastoreEventsProcessor for
alter table rename operation as this is a complex operation regarding
cache modification. Also, we modify cache for a DML operation
'truncate table' using this feature.

Testing:
1) Added few tests in the MetaStoreEventProcessorForTest to verify this
feature that simulates the metadata sync between HMS and Impala.
2) Added few tests in the CatalogHmsSyncToLatestEventIdTest class to
the metadata sync between HMS end point, Catalog Metastore Server and
Impala. The HMS end point serves as common interface to metadata
changes outside the current Impala service such as Hive, Spark or other
Impala service. Also verified the table lastSyncEventId is updated
after the events are sync and confirmed that metastore event processor
ignored these synced events.

Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
---
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M fe/src/test/java/org/apache/impala/catalog/metastore/CatalogHmsSyncToLatestEventIdTest.java
5 files changed, 511 insertions(+), 92 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/67/20367/5
-- 
To view, visit http://gerrit.cloudera.org:8080/20367
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia250d0a943838086c187e5cb7c60035e5a564bbf
Gerrit-Change-Number: 20367
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <k....@gmail.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>