You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2023/01/02 16:01:15 UTC

[Impala-ASF-CR] IMPALA:11808: Added support for reload event in catalogD

Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/19378 )

Change subject: IMPALA:11808: Added support for reload event in catalogD
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@518
PS4, Line 518:     FireEventRequestData data = new FireEventRequestData();
What is the expectation for other HMS users than Impala when receiving this event? Will they ignore it because they do not know the new member in the union FireEventRequestData? https://github.com/apache/hive/blob/da42736483ce110997f924a746c63b4bc8a8d5b7/standalone-metastore/metastore-common/src/main/thrift/hive_metastore.thrift#L1480


http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19378/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6401
PS4, Line 6401:       MetastoreShim.fireReloadEventHelper(catalog_.getMetaStoreClient(),
Do I understand correctly that we do not check whether the table has actually changed during REFRESH and always send an event?

I have some concerns about potential performance impact of the change. If N Impala clusters connect to an HMS and all do REFRESH for a table(or INVALIDATE METADATA + actually use the table), then all REFRESH/INVALIDATE METADATA triggers an event and causes other catalogds to also do the reload, leading to each catalogd doing N refresh.

I think that this can be solved for REFRESH by skipping the event generation if actually nothing has changed in the table. If there are changes (e.g. new files were written to an external table without sending the relevant events to HMS) then it useful to notify other Impalas to refresh their metadata snapshots, but otherwise the REFRESH was NOOP, it doesn't seem meaningful to send an event.

INVALIDATE METADATA seems like a harder case - we don't know whether there were any changes while unnecessarily invalidating + reloading the table on other catalogds can be very expensive. I think that it would be actually the best to not propagate INVALIDATE METADATA to other cluster, at least by default. A query option could be added to set whether to send the event, so someone could initiate a "global" INVALIDATE METADATA, but not do it by default.


http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py@255
PS4, Line 255: .
nit: +4 indentation


http://gerrit.cloudera.org:8080/#/c/19378/4/tests/custom_cluster/test_events_custom_configs.py@258
PS4, Line 258:       self.client.execute("drop table {}.{}".format(unique_database, test_reload_table))
there is no need to drop the table, unique_database will be cleaned up by test framework



-- 
To view, visit http://gerrit.cloudera.org:8080/19378
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic62d58837d356dc2113f3c0904228ac9de484136
Gerrit-Change-Number: 19378
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Abhishek Rawat <ar...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 02 Jan 2023 16:01:15 +0000
Gerrit-HasComments: Yes