You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org> on 2023/02/08 20:15:47 UTC

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Sai Hemanth Gantasala has uploaded this change for review. ( http://gerrit.cloudera.org:8080/19484


Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a map variable to store the processed reload
event time for the corresponding table/partition.
This will be updated frequently based on refresh or
invalidate commands. This map can be used in the event
processor to decide whether to process or skip the
reload event by comparing the event time of the event
with the latest event time from the map variable.

Testing: Couldn't test this feature in an end-to-end
test because the events would be filtered as self
events. Tested it manually by removing the self event
logic.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 97 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/1
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 3:

(7 comments)

Left some comments mostly about style, but looks good overall.

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3876
PS3, Line 3876:   public void setRefreshMetadataMap(Map<String, Map<String, Integer>> metadataMap) {
              :     refreshMetadataMap.set(metadataMap);
              :   }
Seems unused, in which case there is no need for the AtomicReference.


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3881
PS3, Line 3881:   
nit: needs +4 spaces instead of +2


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3890
PS3, Line 3890:   
nit: needs +4 spaces instead of +2


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2534
PS3, Line 2534:       Map<String, Integer> partitionMap =
              :           catalog_.getPartitionMapFromRefreshMap(dbName_, tblName_);
nit: Maybe we could hide the internal map, and CatalogServiceCatalog could just have a method that would return the last event time for a (db, table, partition).


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2536
PS3, Line 2536: .containsKey(partitionMapKey) &&
              :           partitionMap.get(partitionMapKey)
nit: just to avoid doing 2 lookups, we could just invoke a single get(), then check if the returned value was null.


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4576
PS3, Line 4576:       int currentTimeinSecs = catalog_.now();
              :       Map<String, Integer> partitionMap =
              :           catalog_.getPartitionMapFromRefreshMap(dbName, tblName);
              :       for (String partName : partNames) {
              :         partitionMap.put(partName, currentTimeinSecs);
              :       }
              :       catalog_.addToRefreshMetadataMap(dbName + "." + tblName, partitionMap);
nit: Maybe CatalogServiceCatalog could have a method that would update its map, e.g.:

updateEventTimes(db, tbl, listOfPartitions);


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6490
PS3, Line 6490:       // Update the entries in the refresh metadata map
              :       Integer currentTimeinSecs = catalog_.now();
              :       String dbName = tblName.getDb();
              :       String tableName = tblName.getTbl();
              :       if (req.isIs_refresh()) {
              :         Map<String, Integer> partitionMap = catalog_.getPartitionMapFromRefreshMap(
              :             dbName, tableName);
              :         if (partVals == null) { // update table entry, since partitions are null
              :           partitionMap.put(HdfsTable.DEFAULT_PARTITION_NAME, currentTimeinSecs);
              :         } else { // update all the partitions values
              :           for (String partitionName : partVals) {
              :             partitionMap.put(partitionName, currentTimeinSecs);
              :           }
              :         }
              :         catalog_.addToRefreshMetadataMap(dbName + "." + tableName, partitionMap);
Thic could be also moved to a new 'updateEventTimes' method.



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Mar 2023 14:04:15 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 7:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG@10
PS6, Line 10: the
            : latest event id before loading the table/partition.
            : This will be up
> nit: "the latest event id before loading the table/partition" ?
Ack


http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG@17
PS6, Line 17: 's event id, invalidate event anyway flus
> nit: this seems belonging to the previous patch. No map now.
Ack


http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG@21
PS6, Line 21: optimization to work:
> nit: these need to be updated
Ack


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2572
PS5, Line 2572:       if (currentHmsEventId > eventId) {
> Yeah, the new change LGTM. BTW, it seems they are the same solution, i.e. i
If currentHMSEventId is -1, then I'll set current eventId as latestEventID. This is useful in cases where enableSyncToLatestEventOnDdls is set to false (instead of setting latesteventid to -1, I would like to set it to current event id).


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3847
PS5, Line 3847: i
> Yeah, we should change the name, otherwise it's misleading. I think somethi
Ack


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@750
PS6, Line 750: happene
> nit: "happened"
Ack


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2883
PS6, Line 2883:       LOG.warn(String.format("Unable to fetch latest event id from HMS: %s",
> Let's tolerate the error since this is just for optimization. I think repla
Ack


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/Table.java@1033
PS6, Line 1033: 
> nit: please keep only one blank line between methods
Ack


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/Table.java@1039
PS6, Line 1039:     }
> Should we check if eventId is -1 before this? i.e. only update lastRefreshE
Ack


http://gerrit.cloudera.org:8080/#/c/19484/6/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19484/6/tests/custom_cluster/test_events_custom_configs.py@236
PS6, Line 236: enable_sync_to_latest_event_on_ddls
> If the optimization depends on this flag, we should mention it in the commi
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 15:12:25 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12413/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 09:52:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@300
PS2, Line 300: 
> I think we need ConcurrentHashMap since this will be updated in multi-threa
Ack


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2516
PS1, Line 2516: 
> Sorry, what I mean is clock skew between nodes. It's possible that the HMS 
Ok, Because of the clock skew, (I can think of one example where) we can (possibly) end up processing an older event. And this is inevitable and not desirable. (I cannot think of an example where we miss the event because of clock skew).
Not sure how can overcome this problem.


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6362
PS1, Line 6362:       Reference<Boolean> dbWasAdded = new Reference<Boolean>(false);
> Oh, I just realized the map is updated in fireReloadEventHelper. Can we upd
Ack


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6423
PS1, Line 6423:         throw new TableNotFoundException("Table not found: " +
> nit: need space after '//'
Ack


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6434
PS1, Line 6434: d as a direct DDL operation.
              :       if (tblWasRemoved.getRef()) {
> nit: need spaces after '//', 'for' and around ':'
Ack


http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4513
PS2, Line 4513:   /**
> It seems we don't need to set the map back. It's updated in place.
Ack


http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6425
PS2, Line 6425:             req.getTable_name().getTable_name());
> Why do we need to create a new map here? Can we directly modify the map cor
Ack


http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6444
PS2, Line 6444:         Db addedDb = catalog_.getDb(updatedThriftTable.getTable().getDb_name());
> This might have concurrent issue when there are concurrent refreshes on dif
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Mar 2023 03:35:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12548/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Mar 2023 12:44:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@301
PS4, Line 301:   final TopicMode topicMode_;
> After checking more scenarios, I realized this map could lead to mem leak i
Ack. I'll use EventId instead of Event time to overcome the clock skew issue.


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2692
PS4, Line 2692:       }
> We should remove the map item before returning here.
Ack


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2702
PS4, Line 2702:     Table newTable = addIncompleteTable(dbName, tblName,
> Is it possible if the db is not in the catalog cache but somehow we have co
Ack


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3882
PS4, Line 3882: 
> If the partitionName doesn't exist in the keys, it's also possible that the
Ack


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2534
PS4, Line 2534: 
> nit: "lastRefreshTime" might be more clear
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Mar 2023 12:25:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12342/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Feb 2023 20:36:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 2:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@300
PS2, Line 300: HashMap
I think we need ConcurrentHashMap since this will be updated in multi-threads.


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2516
PS1, Line 2516: 
> Yeah, the event time is generated by HMS. This variable stores the latest t
Sorry, what I mean is clock skew between nodes. It's possible that the HMS and catalogd nodes have several milliseconds difference. If any of them have connection issues with the ntp server, the time difference could be even larger.


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6362
PS1, Line 6362:       }
> Yeah, I did cover the normal refresh cases as well. Did I miss anything?
Oh, I just realized the map is updated in fireReloadEventHelper. Can we update the method name to reflect this, e.g. fireReloadEventAndUpdateRefreshMap?


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6423
PS1, Line 6423:       //Update the entries in the refresh metadata map
nit: need space after '//'


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6434
PS1, Line 6434: //update all the partitions values
              :           for(String partitionName:
nit: need spaces after '//', 'for' and around ':'


http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4513
PS2, Line 4513:       catalog_.setRefreshMetadataMap(refreshMetadataMap);
It seems we don't need to set the map back. It's updated in place.


http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6398
PS2, Line 6398:       catalog_.setRefreshMetadataMap(refreshMetadataMap);
I think we can simply set an empty map and don't need to get and clear the current map. Or if we get and clear the current map, we don't need to set it back.


http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6425
PS2, Line 6425:           new HashMap<>(catalog_.getRefreshMetadataMap());
Why do we need to create a new map here? Can we directly modify the map corresponding to the table?


http://gerrit.cloudera.org:8080/#/c/19484/2/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6444
PS2, Line 6444:       catalog_.setRefreshMetadataMap(refreshMetadataMap);
This might have concurrent issue when there are concurrent refreshes on different tables.



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Sat, 25 Feb 2023 05:29:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2572
PS5, Line 2572:       tbl.setLastRefreshEventId(currentHmsEventId);
> The above call on getCurrentNotificationEventId() could fail. Let's only se
My intention here is to set the lastRefreshEventId on the table as high as possible. Is the new change acceptable?


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3847
PS5, Line 3847: >
> Why is a larger event id outdated?
I wanted to check if the incoming event id is outdated than the current partition event id. In that case I'll change the name of the method to 'isPartitionUpdated'.


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1230
PS5, Line 1230:   public void load(boolean reuseMetadata, IMetaStoreClient client,
> This is used in many DML code paths, e.g. AlterTable. We need to update las
Ack


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2891
PS5, Line 2891: client.getCurrentNotificationEventId()
              :             .getEventId()
> Can we get this once before the loop? This might introduce lots of RPCs for
Ack


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/Table.java@1044
PS5, Line 1044: create
> nit: "refresh" ?
Ack


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@155
PS5, Line 155:       }
> I think we should also update lastRefreshEventId here (to be latestEventId)
Ack


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@36
PS5, Line 36: import org.apache.hadoop.hive.common.FileUtils;
            : import org.apache.hadoop.hive.metastore.api.FieldSchema;
> nit: unused imports?
Ack


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2527
PS5, Line 2527: first
> nit: "for" ?
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 11:05:50 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19484

to look at the new patch set (#4).

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a map variable to store the processed reload
event time for the corresponding table/partition.
This will be updated frequently based on refresh or
invalidate commands. This map can be used in the event
processor to decide whether to process or skip the
reload event by comparing the event time of the event
with the latest event time from the map variable.

Testing: Couldn't test this feature in an end-to-end
test because the events would be filtered as self
events. Tested it manually by removing the self event
logic.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 117 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/4
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a new variable 'lastRefreshEventId' in the
catalogD's table/partition object to store the
latest event id before loading the table/partition.
This will be updated frequently based on refresh
or invalidate commands. This variable can be used
in the event processor to decide whether to process
or skip the reload event by comparing it with the
current event id. It is enough to store the refresh
event's event id, invalidate event anyway flushes
out the object from cache.

Note: Need to enable two configs for this
optimization to work:
1) enable_reload_events=true
2) enable_sync_to_latest_event_on_ddls=true

Testing: Added a test to fire few reload events via
HMS API and then verify in the event processor that
some older events are skipped.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Reviewed-on: http://gerrit.cloudera.org:8080/19484
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
8 files changed, 135 insertions(+), 13 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 10
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19484/7/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19484/7/tests/custom_cluster/test_events_custom_configs.py@293
PS7, Line 293: r
flake8: F841 local variable 'resp' is assigned to but never used



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 15:12:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 6:

(11 comments)

The patch looks pretty good to me!

http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG@10
PS6, Line 10: the
            : processed reload event's ID for the corresponding
            : table/partition
nit: "the latest event id before loading the table/partition" ?


http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG@17
PS6, Line 17: the latest event id from the map variable
nit: this seems belonging to the previous patch. No map now.


http://gerrit.cloudera.org:8080/#/c/19484/6//COMMIT_MSG@21
PS6, Line 21: Testing: Couldn't test this feature in an end-to-end
nit: these need to be updated


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2572
PS5, Line 2572:       if (currentHmsEventId > eventId) {
> My intention here is to set the lastRefreshEventId on the table as high as 
Yeah, the new change LGTM. BTW, it seems they are the same solution, i.e. if currentHmsEventId < eventId, it should be -1.


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3847
PS5, Line 3847: m
> I wanted to check if the incoming event id is outdated than the current par
Yeah, we should change the name, otherwise it's misleading. I think something like "isPartitionLoadedAfterEvent" might be more clear.


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@750
PS6, Line 750: happend
nit: "happened"


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2883
PS6, Line 2883:       throw new CatalogException(exception.getMessage());
Let's tolerate the error since this is just for optimization. I think replacing this with a warning log might be better.


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/Table.java@1033
PS6, Line 1033: 
nit: please keep only one blank line between methods


http://gerrit.cloudera.org:8080/#/c/19484/6/fe/src/main/java/org/apache/impala/catalog/Table.java@1039
PS6, Line 1039:     lastRefreshEventId_ = eventId;
Should we check if eventId is -1 before this? i.e. only update lastRefreshEventId_ when it's smaller than eventId.


http://gerrit.cloudera.org:8080/#/c/19484/6/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19484/6/tests/custom_cluster/test_events_custom_configs.py@236
PS6, Line 236: enable_sync_to_latest_event_on_ddls
If the optimization depends on this flag, we should mention it in the commit message. I feel like we can consolidate lastSyncedEventId and lastRefreshEventId. But it would be a more complex change. We can revisit it in a sperate JIRA.


http://gerrit.cloudera.org:8080/#/c/19484/6/tests/custom_cluster/test_events_custom_configs.py@288
PS6, Line 288: 
Not sure if it's complex but we are missing test coverage on skipping partition-level reload events.



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 11:57:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12595/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 06:00:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12583/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 15:31:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19484/7/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19484/7/tests/custom_cluster/test_events_custom_configs.py@293
PS7, Line 293: s
> nit: Let's remove this unused 'resp' variable
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 05:41:01 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 06:35:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19484

to look at the new patch set (#5).

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a new variable 'lastRefreshEventId' in the
catalogD's table/partition object to store the
processed reload event's ID for the corresponding
table/partition. This will be updated frequently
based on refresh or invalidate commands. This
variable can be used in the event processor to
decide whether to process or skip the
reload event by comparing the event id of the event
with the latest event id from the map variable. It
is enough to store the refresh event's event id,
invalidate event anyway flushes out the object from cache.

Testing: Couldn't test this feature in an end-to-end
test because the events would be filtered as self
events. Tested it manually by removing the self event
logic.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
6 files changed, 98 insertions(+), 11 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/5
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 4:

(5 comments)

Sorry to be late here but I still have some concerns on the solution.

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@301
PS4, Line 301:   private AtomicReference<Map<String, Map<String,Integer>>> refreshMetadataMap =
After checking more scenarios, I realized this map could lead to mem leak if items are not removed correctly. We should carefully maintain their life cycles to match their corresponding catalog objects. E.g. when a partition or table is dropped, we need to remove the corresponding item. For a DROP DB CASCADE statement, we need to remove items of all the underlying tables. For RENAME operations on a table or db, we should also update the map correspondingly. These are not handled by the current patch. There might be more cases that we need to review.

Instead of maintaining a map like this, another solution is adding a "lastRefreshTime" field in the Table and HdfsPartition objects. The advantage is we don't need to worry about life cycles of the map items. The required changes might be similar to this patch but we don't need to add codes about removing map items when the corresponding catalog objects are dropped/renamed.

What do you think?


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2692
PS4, Line 2692:           return result.toTCatalogObject();
We should remove the map item before returning here.


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2702
PS4, Line 2702:         return null;
Is it possible if the db is not in the catalog cache but somehow we have corresponding items in the refreshMetadataMap?


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3882
PS4, Line 3882:     return null;
If the partitionName doesn't exist in the keys, it's also possible that there was a table level refresh so the map was cleared and only has the key of DEFAULT_PARTITION_NAME. In this case we should check DEFAULT_PARTITION_NAME as well.


http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2534
PS4, Line 2534: eventTime
nit: "lastRefreshTime" might be more clear



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sun, 05 Mar 2023 13:07:39 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19484

to look at the new patch set (#6).

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a new variable 'lastRefreshEventId' in the
catalogD's table/partition object to store the
processed reload event's ID for the corresponding
table/partition. This will be updated frequently
based on refresh or invalidate commands. This
variable can be used in the event processor to
decide whether to process or skip the
reload event by comparing the event id of the event
with the latest event id from the map variable. It
is enough to store the refresh event's event id,
invalidate event anyway flushes out the object from cache.

Testing: Couldn't test this feature in an end-to-end
test because the events would be filtered as self
events. Tested it manually by removing the self event
logic.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
8 files changed, 125 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/6
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19484

to look at the new patch set (#2).

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a map variable to store the processed reload
event time for the corresponding table/partition.
This will be updated frequently based on refresh or
invalidate commands. This map can be used in the event
processor to decide whether to process or skip the
reload event by comparing the event time of the event
with the latest event time from the map variable.

Testing: Couldn't test this feature in an end-to-end
test because the events would be filtered as self
events. Tested it manually by removing the self event
logic.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 109 insertions(+), 2 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/2
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19484

to look at the new patch set (#3).

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a map variable to store the processed reload
event time for the corresponding table/partition.
This will be updated frequently based on refresh or
invalidate commands. This map can be used in the event
processor to decide whether to process or skip the
reload event by comparing the event time of the event
with the latest event time from the map variable.

Testing: Couldn't test this feature in an end-to-end
test because the events would be filtered as self
events. Tested it manually by removing the self event
logic.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
3 files changed, 114 insertions(+), 5 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/3
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12580/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 08:11:48 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19484

to look at the new patch set (#7).

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a new variable 'lastRefreshEventId' in the
catalogD's table/partition object to store the
latest event id before loading the table/partition.
This will be updated frequently based on refresh
or invalidate commands. This variable can be used
in the event processor to decide whether to process
or skip the reload event by comparing it with the
current event id. It is enough to store the refresh
event's event id, invalidate event anyway flushes
out the object from cache.

Note: Need to enable two configs for this
optimization to work:
1) enable_reload_events=true
2) enable_sync_to_latest_event_on_ddls=true

Testing: Added a test to fire few reload events via
HMS API and then verify in the event processor that
some older events are skipped.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
8 files changed, 135 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/7
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/12520/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 3
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Thu, 02 Mar 2023 19:27:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Hello Quanlong Huang, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/19484

to look at the new patch set (#8).

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................

IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Added a new variable 'lastRefreshEventId' in the
catalogD's table/partition object to store the
latest event id before loading the table/partition.
This will be updated frequently based on refresh
or invalidate commands. This variable can be used
in the event processor to decide whether to process
or skip the reload event by comparing it with the
current event id. It is enough to store the refresh
event's event id, invalidate event anyway flushes
out the object from cache.

Note: Need to enable two configs for this
optimization to work:
1) enable_reload_events=true
2) enable_sync_to_latest_event_on_ddls=true

Testing: Added a test to fire few reload events via
HMS API and then verify in the event processor that
some older events are skipped.

Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/TableLoader.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M tests/custom_cluster/test_events_custom_configs.py
8 files changed, 135 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/19484/8
-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 8
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 9: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 11:47:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2442
PS1, Line 2442:     private int eventTime_;
> nit: need a blank line after this
Ack


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2514
PS1, Line 2514: catalog_.getPartitionMapFromRefreshMap(dbName_, tblName_
> nit: we can extract the code of getting the map value
Ack


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2516
PS1, Line 2516: 
> Is the eventTime generated by HMS? If there are time skew between the HMS n
Yeah, the event time is generated by HMS. This variable stores the latest timestamp of catalogD node when there is a refresh event, the HMS timestamp is generated much later. So I don't think you can skip refresh events by any chance. 

In HMS, the timestamp generation is in a private method of the DbNotificationListener class. The timestamp is unix epoch, so we don't need to compute the time difference. WHat do you think?


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6362
PS1, Line 6362:       }
> Should we update 'refreshMetadataMap' for normal refresh cases?
Yeah, I did cover the normal refresh cases as well. Did I miss anything?



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 2
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Feb 2023 16:22:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Sai Hemanth Gantasala (Code Review)" <ge...@cloudera.org>.
Sai Hemanth Gantasala has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3876
PS3, Line 3876:   public Integer getEventTimeFromRefreshMap(String dbName, String tblName,
              :       String partitionName) {
              :    
> Seems unused, in which case there is no need for the AtomicReference.
Ack


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3881
PS3, Line 3881:   
> nit: needs +4 spaces instead of +2
Ack


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3890
PS3, Line 3890:   
> nit: needs +4 spaces instead of +2
Ack


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2534
PS3, Line 2534:       Integer eventTime = catalog_.getEventTimeFromRefreshMap(dbName_, tblName_,
              :           partitionMapKey);
> nit: Maybe we could hide the internal map, and CatalogServiceCatalog could 
Ack


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2536
PS3, Line 2536:  null && eventTime >= eventTime_) {
              :         return true;
> nit: just to avoid doing 2 lookups, we could just invoke a single get(), th
Ack


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4576
PS3, Line 4576:       catalog_.updateEventTimesOfRefreshMap(dbName, tblName, partNames);
              :       return numOfPartsReloaded;
              :     } catch (TableLoadingException e) {
              :       LOG.info("Could not reload {} partitions of table {}", partNames.size(),
              :           table.getFullName(), e);
              :     } catch (InternalException e) {
              :       errorOccured = true;
> nit: Maybe CatalogServiceCatalog could have a method that would update its 
Ack


http://gerrit.cloudera.org:8080/#/c/19484/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6490
PS3, Line 6490:         // remove the entries of table and its partitions
              :         catalog_.removeEntryFromRefreshMap(dbName + "." + tableName);
              :       }
              :       // Get new catalog version for table refresh/invalidate.
              :       long newCatalogVersion = updatedThriftTable.getCatalog_version();
              :       Map<String, String> tableParams = new HashMap<>();
              :       tableParams.put(MetastoreEventPropertyKey.CATALOG_SERVICE_ID.getKey(),
              :           catalog_.getCatalogServiceId());
              :       tableParams.put(MetastoreEventPropertyKey.CATALOG_VERSION.getKey(),
              :           String.valueOf(newCatalogVersion));
              :       MetastoreShim.fireReloadEventHelper(catalog_.getMetaStoreClient(),
              :           req.isIs_refresh(), partVals, dbName, tableName,
              :           tableParams);
              :       if (req.isIs_refresh()) {
              :         if (catalog_.tryLock(tbl, true, 600000)) {
> Thic could be also moved to a new 'updateEventTimes' method.
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 03 Mar 2023 18:27:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 4:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/12535/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 04 Mar 2023 04:26:58 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/19484/6/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19484/6/tests/custom_cluster/test_events_custom_configs.py@284
PS6, Line 284: r
flake8: F841 local variable 'resp' is assigned to but never used



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 6
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Mar 2023 07:52:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9126/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 06:36:13 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 9: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 9
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 06:36:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 7: Code-Review+1

(1 comment)

LGTM

http://gerrit.cloudera.org:8080/#/c/19484/7/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/19484/7/tests/custom_cluster/test_events_custom_configs.py@293
PS7, Line 293: r
> flake8: F841 local variable 'resp' is assigned to but never used
nit: Let's remove this unused 'resp' variable



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 7
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Mar 2023 05:11:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/19484/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19484/1//COMMIT_MSG@17
PS1, Line 17: Testing: Couldn't test this feature in an end-to-end
I think we need our infra scripts to support launching multiple Impala clusters in our dev box. Filed IMPALA-11933.


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@294
PS1, Line 294:   private Map<String, Map<String,Integer>> refreshMetadataMap = new HashMap<>();
Could you add a comment about the key and value types?


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2442
PS1, Line 2442:     private int eventTime_;
nit: need a blank line after this


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2514
PS1, Line 2514: refreshMetadataMap.getOrDefault(dbName_ + "." + tblName_
nit: we can extract the code of getting the map value


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2516
PS1, Line 2516: >= eventTime_
Is the eventTime generated by HMS? If there are time skew between the HMS node and catalogd node, we might ignore some reload events unexpectedly.

Is there an HMS API that can detect the HMS local time? Or can we get the time difference (if there is) in self event detection?


http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/19484/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@6362
PS1, Line 6362:       }
Should we update 'refreshMetadataMap' for normal refresh cases?



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 1
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Feb 2023 12:31:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events

Posted by "Quanlong Huang (Code Review)" <ge...@cloudera.org>.
Quanlong Huang has posted comments on this change. ( http://gerrit.cloudera.org:8080/19484 )

Change subject: IMPALA-11822: Optimize the Refresh/Invalidate event processing by skipping unnecessary events
......................................................................


Patch Set 5:

(9 comments)

Thanks for implementing the new solution so quickly!

http://gerrit.cloudera.org:8080/#/c/19484/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/19484/5//COMMIT_MSG@21
PS5, Line 21: Testing: Couldn't test this feature in an end-to-end
Maybe we can try adding two kinds of tests:

1. Try firing reload events using the HMS thrift APIs in python codes, then verified the behaviours in catalogd.
2. Expose the lastRefreshEventId in catalogd's webUI. Verified it after some operations.


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@2572
PS5, Line 2572:       tbl.setLastRefreshEventId(currentHmsEventId);
The above call on getCurrentNotificationEventId() could fail. Let's only set this when currentHmsEventId != -1


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@3847
PS5, Line 3847: >
Why is a larger event id outdated?


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@1230
PS5, Line 1230:   public void load(boolean reuseMetadata, IMetaStoreClient client,
This is used in many DML code paths, e.g. AlterTable. We need to update lastRefreshEventId as well. But I think it's ok to do it in follow-up JIRAs.


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2891
PS5, Line 2891: client.getCurrentNotificationEventId()
              :             .getEventId()
Can we get this once before the loop? This might introduce lots of RPCs for partitioned tables.


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/Table.java@1044
PS5, Line 1044: create
nit: "refresh" ?


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/TableLoader.java
File fe/src/main/java/org/apache/impala/catalog/TableLoader.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/TableLoader.java@155
PS5, Line 155:       }
I think we should also update lastRefreshEventId here (to be latestEventId). So ReloadEvents before that can be skipped.


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@36
PS5, Line 36: import org.apache.hadoop.hive.common.FileUtils;
            : import org.apache.hadoop.hive.metastore.api.FieldSchema;
nit: unused imports?


http://gerrit.cloudera.org:8080/#/c/19484/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@2527
PS5, Line 2527: first
nit: "for" ?



-- 
To view, visit http://gerrit.cloudera.org:8080/19484
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I905957683a96c3ea01ab4bf043d6658ce37b7574
Gerrit-Change-Number: 19484
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sai Hemanth Gantasala <sa...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Mar 2023 01:15:17 +0000
Gerrit-HasComments: Yes