You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Xiaomeng Zhang (Code Review)" <ge...@cloudera.org> on 2020/04/06 02:32:16 UTC

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Xiaomeng Zhang has uploaded this change for review. ( http://gerrit.cloudera.org:8080/15648


Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................

IMPALA-8632: Add support for self-event detection for insert events

In case of INSERT_EVENTS if Impala inserts into a table it causes a
refresh to the underlying table/partition. This could be unnecessary
when there is only one Impala cluster in the system.
We can detect a self-event in such cases when the HMS API to fire a
listener event returns the event id. This is used by EventProcessor
to ignore the event when it is fetched later in the next polling cycle.

Testing:
Add testInsertEvents() in MetastoreEventsProcessorTest.java to test
insert event self-event detection when insert into table and partition.

Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
A fe/src/main/java/org/apache/impala/catalog/events/EventId.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
10 files changed, 297 insertions(+), 83 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/15648/1
-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 1
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 4:

(9 comments)

mostly looks good to me. Should be okay once we add the suggestions.

http://gerrit.cloudera.org:8080/#/c/15648/3/be/src/common/global-flags.cc
File be/src/common/global-flags.cc:

http://gerrit.cloudera.org:8080/#/c/15648/3/be/src/common/global-flags.cc@254
PS3, Line 254: 
             : DEFINE_int32(hms_event_polling_interval_s, 0,
> I think this is implementation detail and can be skipped. Also the summary 
Are you planning to update this in a subsequent patch?


http://gerrit.cloudera.org:8080/#/c/15648/4/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/15648/4/bin/impala-config.sh@177
PS4, Line 177: export CDP_BUILD_NUMBER=2506706
The changes in this file should be made in a separate change. Also, this build number has a problem which causes a test failure. I uploaded a different patch to bump it up a better build. https://gerrit.cloudera.org/#/c/15668/. May be revert the changes to this file and rebase on top of my patch linked above?


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@555
PS4, Line 555: null
May be change this to Collections.emptyList() so that the callers don't need to check for null.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@997
PS4, Line 997:   public static class InsertEventInfo {
this duplicates a lot of code between the two version of the MetastoreShim. I think we can keep the existing fireInsertEvent method in the MetastoreUtils method. Here we can just make use of the return types. For instance, in hive-2 shim we do

public static List<Long> fireInsertEvent(IMetaStoreClient msClient,
      List<InsertEventInfo> insertEventInfos) throws TException {
  MetastoreUtil.fireInsertEvent(msClient, insertEventInfos);
  return Collections.emptyList();
}

in hive-3 shim we do
public static List<Long> fireInsertEvent(IMetaStoreClient msClient,
      List<InsertEventInfo> insertEventInfos) throws TException {
  FireEventResponse resp = MetastoreUtil.fireInsertEvent(msClient, insertEventInfos);
  Preconditions.checkState(resp.getEventIds() != null && !resp.getEventIds().isEmpty());
  return resp.getEventIds();
}


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@35
PS4, Line 35:   private final long idFromEvent_;
rename to insertEventId_?


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@69
PS4, Line 69:     idFromEvent_ = eventId;
can you add a preconditions check that this eventId>0


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4487
PS4, Line 4487:     // Firing insert events by making calls to HMS APIs can be slow for tables with
              :     // large number of partitions.
this comment should be updated with the bulk API it should not be a problem anymore. May be add a TODO to evaluate the performance of bulk API


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4506
PS4, Line 4506: e
can be moved to earlier line.


http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@725
PS3, Line 725:     Assume.assumeTrue("Skipping this test because it only works with Hive-3 or greater",
> Can you also enable the which is commented out here: https://github.com/apa
ping



-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Apr 2020 22:21:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Xiaomeng Zhang (Code Review)" <ge...@cloudera.org>.
Xiaomeng Zhang has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................

IMPALA-8632: Add support for self-event detection for insert events

In case of INSERT_EVENTS if Impala inserts into a table it causes a
refresh to the underlying table/partition. This could be unnecessary
when there is only one Impala cluster in the system.
We can detect a self-event in such cases when the HMS API to fire a
listener event returns the event id. This is used by EventProcessor
to ignore the event when it is fetched later in the next polling cycle.

Testing:
Add testInsertFromImpala() in MetastoreEventsProcessorTest.java to test
insert event self-event detection when insert into table and partition.

Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M bin/impala-config.sh
M common/thrift/BackendGflags.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_event_processing.py
18 files changed, 487 insertions(+), 171 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/15648/5
-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 5
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Xiaomeng Zhang (Code Review)" <ge...@cloudera.org>.
Xiaomeng Zhang has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@286
PS3, Line 286:    *
> guess this will be removed once we bump up the CDP_BUILD right?
Yes. It will be removed after I have hive jar updated.


http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@849
PS3, Line 849: versionNumber == -1
> Why would this be a case?
As we check -1 for version number. I am checking -1 for eventId as well. Is it guaranteed that eventId will not be less than 0?


http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@897
PS3, Line 897:     Preconditions.checkState(inFlightEvents_.size(false) == 0);
> do we need a similar check for inFlightEvents_.size(false)?
Done


http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4411
PS3, Line 4411: ram isInsertOverwrite indicates if the operation was an inse
> I think this check should be ((!catalog_.isEventProcessingActive() && isIns
Done


http://gerrit.cloudera.org:8080/#/c/15648/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4466
PS3, Line 4466: sMapBeforeInsert.entrySet().iterator().next();
> I think we need to do this via MetastoreShim since otherwise the response w
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Apr 2020 21:41:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5703/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 1
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Mon, 06 Apr 2020 03:17:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Anurag Mantripragada (Code Review)" <ge...@cloudera.org>.
Anurag Mantripragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 5: Code-Review+1

Thanks for addressing my comments. I will let Vihang +2 the change.


-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 5
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Apr 2020 05:17:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 5:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/5746/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 5
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Wed, 08 Apr 2020 05:27:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5741/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Apr 2020 22:23:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................

IMPALA-8632: Add support for self-event detection for insert events

In case of INSERT_EVENTS if Impala inserts into a table it causes a
refresh to the underlying table/partition. This could be unnecessary
when there is only one Impala cluster in the system.
We can detect a self-event in such cases when the HMS API to fire a
listener event returns the event id. This is used by EventProcessor
to ignore the event when it is fetched later in the next polling cycle.

Testing:
Add testInsertFromImpala() in MetastoreEventsProcessorTest.java to test
insert event self-event detection when insert into table and partition.

Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Reviewed-on: http://gerrit.cloudera.org:8080/15648
Reviewed-by: Vihang Karajgaonkar <vi...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_event_processing.py
17 files changed, 504 insertions(+), 171 deletions(-)

Approvals:
  Vihang Karajgaonkar: Looks good to me, approved
  Impala Public Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 10
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Anurag Mantripragada (Code Review)" <ge...@cloudera.org>.
Anurag Mantripragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 4:

(15 comments)

Thanks for the change, I have added some comments, most of them are style-related and some nits.

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@54
PS4, Line 54: import org.apache.impala.catalog.events.NoOpEventProcessor;
Nit: Duplicate import, could you remove this?


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@934
PS4, Line 934:                     
Nit: Incorrect indentation. Remove the empty spaces.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@937
PS4, Line 937:                      
Incorrect indentation. Remove the empty spaces.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/Table.java@807
PS4, Line 807:                      
Nit: Incorrect indentation. Remove the empty spaces.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/Table.java@826
PS4, Line 826:                     
Nit: Incorrect indentation. Remove the empty spaces.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java@65
PS4, Line 65: eventIds_
Here and everywhere else, if this means "only" insert events, may I suggest using insertEventIds_?


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java@69
PS4, Line 69: @param isInsertEvent if true, return list of versions for in-flight Insert events
            :    *              if false, return list of eventIds for in-flight DDL events
Should this be the other way around? i.e., return eventIds for in-flight insert events if it is insert events and versions if it is not.

Also, indentation is incorrect, please remove the extra spaces.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java@87
PS4, Line 87:                      
Nit: Indentation incorrect, remove extra spaces.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java@107
PS4, Line 107:                     
Nit: Indentation incorrect, remove extra spaces.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@773
PS4, Line 773: // long eventId_test = catalog_.public_eventId.consumeId();
Could you remove this comment?


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@787
PS4, Line 787:     /**
             :      * Check self-event for in flight Insert event using eventId
             :      */
Nit: Not sure this is needed since this is an overridden method and its purpose is the same as described in L318.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@75
PS4, Line 75: *;
We should be careful with *. Are you sure we are importing everything from org.apache.impala.catalog?


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4424
PS4, Line 4424: List of all catalog object that was insert into
Did you mean "List of all partitions we insert into"?


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4487
PS4, Line 4487:     // Firing insert events by making calls to HMS APIs can be slow for tables with
              :     // large number of partitions.
> this comment should be updated with the bulk API it should not be a problem
I see that MetastoreShim.fireInsertEvent() for hive-2 does not insert in bulk and is still making one RPC per partition. We should keep the earlier asynchronous fireInsertEvent() for hive-2.


http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

http://gerrit.cloudera.org:8080/#/c/15648/4/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java@33
PS4, Line 33: import org.apache.hadoop.hive.metastore.api.FireEventRequest;
            : import org.apache.hadoop.hive.metastore.api.FireEventRequestData;
            : import org.apache.hadoop.hive.metastore.api.FireEventResponse;
            : import org.apache.hadoop.hive.metastore.api.InsertEventRequestData;
These imports are not needed anymore.



-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Tue, 07 Apr 2020 23:47:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Xiaomeng Zhang (Code Review)" <ge...@cloudera.org>.
Xiaomeng Zhang has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................

IMPALA-8632: Add support for self-event detection for insert events

In case of INSERT_EVENTS if Impala inserts into a table it causes a
refresh to the underlying table/partition. This could be unnecessary
when there is only one Impala cluster in the system.
We can detect a self-event in such cases when the HMS API to fire a
listener event returns the event id. This is used by EventProcessor
to ignore the event when it is fetched later in the next polling cycle.

Testing:
Add testInsertFromImpala() in MetastoreEventsProcessorTest.java to test
insert event self-event detection when insert into table and partition.

Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M bin/impala-config.sh
M common/thrift/BackendGflags.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
17 files changed, 506 insertions(+), 205 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/15648/4
-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 4
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/5770/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 9
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Apr 2020 05:22:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................


Patch Set 9:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/15648/9/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/15648/9/fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java@513
PS9, Line 513:         fireInsertEventHelper(msClient.getHiveClient(), insertEventInfos, dbName, tableName);
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/15648/9/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/15648/9/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@1004
PS9, Line 1004:       return fireInsertEventHelper(msClient.getHiveClient(), insertEventInfos, dbName, tableName);
line too long (98 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 9
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Comment-Date: Thu, 09 Apr 2020 04:31:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8632: Add support for self-event detection for insert events

Posted by "Xiaomeng Zhang (Code Review)" <ge...@cloudera.org>.
Xiaomeng Zhang has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/15648 )

Change subject: IMPALA-8632: Add support for self-event detection for insert events
......................................................................

IMPALA-8632: Add support for self-event detection for insert events

In case of INSERT_EVENTS if Impala inserts into a table it causes a
refresh to the underlying table/partition. This could be unnecessary
when there is only one Impala cluster in the system.
We can detect a self-event in such cases when the HMS API to fire a
listener event returns the event id. This is used by EventProcessor
to ignore the event when it is fetched later in the next polling cycle.

Testing:
Add testInsertFromImpala() in MetastoreEventsProcessorTest.java to test
insert event self-event detection when insert into table and partition.

Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
---
M be/src/common/global-flags.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/Db.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_event_processing.py
17 files changed, 504 insertions(+), 171 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/15648/9
-- 
To view, visit http://gerrit.cloudera.org:8080/15648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7873fbb2c159343690f93b9d120f6b425b983dcf
Gerrit-Change-Number: 15648
Gerrit-PatchSet: 9
Gerrit-Owner: Xiaomeng Zhang <xi...@cloudera.com>
Gerrit-Reviewer: Anurag Mantripragada <an...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Xiaomeng Zhang <xi...@cloudera.com>