You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org> on 2021/08/02 20:43:37 UTC

[Impala-ASF-CR] IMPALA-10746: Drop table/db from catalog cache when drop table/db HMS apis are accessed from catalog's metastore server.

Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17576 )

Change subject: IMPALA-10746: Drop table/db from catalog cache when drop table/db HMS apis are accessed from catalog's metastore server.
......................................................................


Patch Set 13:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2515
PS10, Line 2515:       Table existingTbl = db.getTable(tblName);
               :       if (existingTbl == null) return null;
> It is. We need existingTbl to get eventId which is set in newly initialized
Oh I see. I missed that part. Thanks for clarifying.


http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17576/10/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@462
PS10, Line 462: 
              :     // Parse db name. Throw error if parsing fails.
              :     String dbName;
              :     String catName;
              :     String[] catAndDbName =
              :      
> Good catch!
I agree with the point 2 above. But I think the deleteEventLog should be added from this method not from within removeDbIfnotAddedLater and removeTableIfNotAddedLater since they are used from events processor as well.


http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java
File fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java:

http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@479
PS13, Line 479:           .getNextMetastoreEvents(catalog_, currentEventId,
> @Vihang: Filter for drop_database in catalogOpExecutor is 
I believe it fetches for DROP_TABLE events as well because a cascade drop operation on db will drop the tables as well and we would need to add them to the deleteEventLog. I don't think this API drop_database supports cascade (although I didn't verify in the code). If this code would be executed in a drop database cascade operation then we should fetch the drop_table events as well.


http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@456
PS13, Line 456: if (!invalidateCacheOnDDLs_) {
              :       LOG.debug("invalidateCacheOnDDLs_ flag is false, skipping " +
              :           "cache update for hms operation drop_database on db: " +
              :           databaseName);
              :       return;
              :     }
              : 
              :     // Parse db name. Throw error if parsing fails.
              :     String dbName;
              :     String catName;
              :     String[] catAndDbName =
              :         MetaStoreUtils.parseDbName(databaseName, serverConf_);
              :     catName = catAndDbName[0];
              :     dbName = catAndDbName[1];
              :     // get DB to drop from catalog cache
              :     Db dbToDrop = catalog_.getDb(dbName);
              :     if(dbToDrop == null) {
              :       // dbToDrop doesn't exist in cache
              :       return;
              :     }
              :     List<NotificationEvent> events;
              :     try {
              :       events = MetastoreEventsProcessor
              :           .getNextMetastoreEvents(catalog_, currentEventId,
              :               event -> event.getEventType()
              :                   .equalsIgnoreCase(MetastoreEvents.DropDatabaseEvent
              :                       .DROP_DATABASE_EVENT_TYPE)
              :                   && catName.equalsIgnoreCase(event.getCatName())
              :                   && dbName.equalsIgnoreCase(event.getDbName()));
              :     } catch (ImpalaException e) {
              :       String errorMsg = "Unable to process Drop database event for db: " +
              :           dbToDrop.getName();
              :       LOG.error(errorMsg, e);
              :       throw new MetaException(errorMsg);
              :     }
              :     if (events.isEmpty()) {
              :       throw new MetaException(
              :           "Drop database event not received. Check if notification " +
              :               "events are configured in hive metastore");
              :     }
              :     long dropEventId = events.get(events.size() - 1).getEventId();
              :     boolean isRemoved =
              :         catalogOpExecutor_.removeDbIfNotAddedLater(dropEventId,
              :             dbToDrop.getMetaStoreDb());
              :     if (isRemoved) {
              :       LOG.info("Removed database: " + databaseName +
              :           " from cache due to HMS API: drop_database");
> I intentionally didn't do it since there is only one drop db api. Let me kn
I felt the code readability improves if you refactor it out. But I don't have strong opinion about it.


http://gerrit.cloudera.org:8080/#/c/17576/13/fe/src/main/java/org/apache/impala/catalog/metastore/MetastoreServiceHandler.java@2968
PS13, Line 2968: Map<String, String> tblProperties = catalogTbl.getMetaStoreTable().getParameters();
               :     if (tblProperties == null || MetaStoreUtils.isTransactionalTable(tblProperties)) {
> How would it help? If we use Impala's AcidUtils method the logic would be: 
I agree that functionally they both for same. My point was more related to consistency within the code. Several other places in the catalogd use AcidUtils.isTransactionalTable. It makes it easier to lookup the places as well easy to change/enhance the implementation if needed. The other advantage is that we don't need to depend on a non-public API in HMS which can be changed/removed without any notice.



-- 
To view, visit http://gerrit.cloudera.org:8080/17576
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic2e2ad2630e2028b8ad26a6272ee766b27e0935c
Gerrit-Change-Number: 17576
Gerrit-PatchSet: 13
Gerrit-Owner: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Anonymous Coward <ki...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Aug 2021 20:43:37 +0000
Gerrit-HasComments: Yes