You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org> on 2019/03/01 00:31:08 UTC

[Impala-ASF-CR] IMPALA-7972 Detect self-events to avoid unnecessary invalidates

Vihang Karajgaonkar has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/12591 )

Change subject: IMPALA-7972 Detect self-events to avoid unnecessary invalidates
......................................................................

IMPALA-7972 Detect self-events to avoid unnecessary invalidates

This patch adds support to detect self-generated events from catalog.
This is used to avoid unnecessary invalidates to the tables from such
self-events. Currently, alter_table, alter_partition, add_partition and
drop_partition event types can invalidate the table metadata.

Originally, we planned to have a global version number support from
metastore (see HIVE-21115). But since that is still not complete, we
rely on a combination of other identifiers to determine if a event is
self-generated or not. These self-event identifiers consists of values
from the table/partition parameters. A catalog service uuid
and the catalog version number. The uuid is generated for each
catalogservice when it comes up and it adds it to the table/partition
parameters with the key "impala.CatalogServiceId". The catalog version
number is added with the key "impala.CatalogVersion".

When catalog executes a DDL operation it appends the current catalog
version to the list of version numbers for the in-flight events for the
table. Events processor clears this version when the corresponding
version number identified by serviceId is received in the event. This is
needed since it is possible that a external non-Impala system which
generates the event presents the same serviceId and version number later
on. The algorithm to detect a self-event is as below.

1. Add the service id and expected catalog version to table/partition
parameters when executing the DDL operation. When the HMS operation is
successful, add the version number to the list of version for in-flight
events at table level.
2. When the event is received, the first time you see the combination of
serviceId and version number, event processor clears the version number
from table's list and determines the event as self-generated (and hence
ignored)
3. If the event data presents a unknown serviceId or if the version
number is not present in the list of in-flight versions, event is not a
self-event and needs to be processed.

In order to limit the total memory footprint, only 10 version numbers
are stored at the table. Since the event processor is expected to poll
every few seconds this should be a reasonable bound which satisfies most
use-cases. Otherwise, event processor may wrongly process a self-event
to invalidate the table. In such a case, its a performance penalty not a
correctness issue.

In case of drop_partition event, the partition object is not available
in the event. Hence we cannot determine if its a self-event. In such
cases currently we always issue a invalidate command. This is a known
limitation and will be improved in IMPALA-7973

Patch adds new tests to trigger alter table/partition DDLs from impala
and makes sure that the table is not invalidated.

Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
6 files changed, 1,193 insertions(+), 211 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/91/12591/8
-- 
To view, visit http://gerrit.cloudera.org:8080/12591
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I6db0d7f7fe465158fc8cb9d6b6b57a321827b353
Gerrit-Change-Number: 12591
Gerrit-PatchSet: 8
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Bharath Krishna <bh...@cloudera.com>
Gerrit-Reviewer: Bharath Vissapragada <bh...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Paul Rogers <pr...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>