You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org> on 2021/09/15 19:40:19 UTC

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Vihang Karajgaonkar has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17848


Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
11 files changed, 927 insertions(+), 195 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has removed Zoltan Borok-Nagy from this change.  ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Removed reviewer Zoltan Borok-Nagy.
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteReviewer
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
9 files changed, 924 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9485/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Sep 2021 18:02:52 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
M tests/util/event_processor_utils.py
10 files changed, 938 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 11: Code-Review+1

The patch looks good to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 21:57:12 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:       String serviceIdOfEvent = MetastoreEvents.getStringProperty(parametersFromEvent,
> Created IMPALA-10949 as a follow-up on this.
Thanks !



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 21:56:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 5:

(4 comments)

Did a first round. The change looks really great. I can take a deeper look on Wednesday.

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@955
PS5, Line 955:     
nit: too much indent


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1027
PS5, Line 1027:   
nit: +2 indent


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1811
PS5, Line 1811:    
nit: indentation is off


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@100
PS5, Line 100: idx >=0 && idx < insertEventIds_.size()
This precondition is checked by the Java runtime.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Sep 2021 19:23:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 6:

(11 comments)

The patch looks great. I mostly found nits.

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2716
PS6, Line 2716: the
nit: then?


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2733
PS6, Line 2733: value for a given partition key is null
When we build the partition map we only put non-null values into it at L2709.


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2741
PS6, Line 2741: hmsPartsToHdfsParts.get(hmsPartition);
You could iterate through 'hmsPartsToHdfsParts.entrySet()' in the for loop, so this lookup wouldn't be needed.


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@284
PS6, Line 284: + "to {}",
nit: Could fit prev line, then L285 could fit this line. Just like at L298.


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@602
PS6, Line 602: ForBatching
nit: maybe just 'getPartition()'? Adding 'ForBatching' makes me think that we only batch events of the same partition.


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1792
PS6, Line 1792:     
nit: indentation


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1836
PS6, Line 1836: //we
nit: missing space


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1836
PS6, Line 1836: a
nit: an


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1838
PS6, Line 1838: batchedEvents_.get(0)
nit: baseEvent_?


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4140
PS6, Line 4140: );
Shouldn't we still include 'e'?


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2335
PS6, Line 2335: table name
database?



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Sep 2021 14:23:26 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9463/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Sep 2021 18:56:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9459/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Sep 2021 20:02:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
11 files changed, 932 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9478/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Sep 2021 20:09:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
M tests/util/event_processor_utils.py
10 files changed, 938 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 10
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 11: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 04:39:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
M tests/metadata/test_event_processing.py
M tests/util/event_processor_utils.py
11 files changed, 939 insertions(+), 214 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9537/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 22:48:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7487/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 23:13:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7482/


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 03:07:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:       return versionNumberFromEvent_ == versionNumberOfEvent && serviceIdFromEvent_.equals(
> Makes sense. We can take it up later. Please do file a jira to track this.
Created IMPALA-10949 as a follow-up on this.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 17:38:17 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7482/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Sep 2021 20:50:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
File fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@955
PS5, Line 955:     
> nit: too much indent
Done


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java@1027
PS5, Line 1027:   
> nit: +2 indent
Done


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2713
PS5, Line 2713:       return hmsPartToHdfsPart.size();
> Shouldn't we return hmsPartToHdfsPart.size() since those are the actual par
yes, thanks for spotting that. Fixed.


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@279
PS2, Line 279:         if (!current.canBeBatched(next)) {
> Currently one of the conditions of batching two events together is - event 
Yes, the events definitely can be intermingled from multiple tables. In case of ALTER_PARTITION event it is less likely to happen since these events are transactional in nature (publish all or none) and they hold a eventId lock in HMS when being generated. However, this is highly dependent on the source application which generates the events. For example if Hive makes 10 alterPartition calls instead of 1 alterPartitions() for 10 partitions, the event stream may include events which are inter-mingled between different tables.

The batching logic currently optimizes for the common case of consecutive ALTER_PARTITION or INSERT events being generated by a single query. More sophisticated batching schemes like you mentioned can be explored but may increase the scope of the patch and I am inclined to commit this first and then incrementally improve it if needed. My primary concern about batching non-consecutive events is that we need to make sure that we are not introducing any subtle bugs because of the the fact that we are refreshing the partitions out of order then event stream.


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:               MetastoreEventPropertyKey.CATALOG_VERSION.getKey(), "-1"));
> I understand that there may not be significant performance gain. I still fe
Okay. Let me look into this and see how much additional changes will need to made for this.


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1811
PS5, Line 1811: Ref
> nit: indentation is off
My IDE seems to like doing this. Not sure how to fix it. Manually edited it.


http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
File fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java@100
PS5, Line 100: dx);
> This precondition is checked by the Java runtime.
makes sense. Removed it.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 21 Sep 2021 19:46:35 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 11:

> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7487/

I have disabled the test test_events_replication for transactional tables currently because it is flaky for the known issue reported in IMPALA-9057 for transactional tables. I will send out a separate patch to re-enable it.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 01 Oct 2021 17:29:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 11: Code-Review+2

Looks great!


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 04 Oct 2021 09:28:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/5/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2713
PS5, Line 2713:       return hmsPartitions.size();
Shouldn't we return hmsPartToHdfsPart.size() since those are the actual partitions we are reloading in the cache?


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@279
PS2, Line 279:         if (!current.canBeBatched(next)) {
Currently one of the conditions of batching two events together is - event ids should be consecutive. This means that if there are events intermingled from multiple tables, then this batching logic won't be very effective. Do we see intermingled events in production? If yes, should we improve the batching logic here?


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:               MetastoreEventPropertyKey.CATALOG_VERSION.getKey(), "-1"));
> Since the default value of polling is 1s and we fetch the events in the bat
I understand that there may not be significant performance gain. I still feel that we should batch alter partition events from other clusters as well (and ignore self events when actually processing a batched event) considering that we would be syncing table till latest event id in future where self event logic will not be applicable anymore.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 20 Sep 2021 20:31:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 7:

(3 comments)

> Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7482/

One of the test which failed was caused by the changes to the last patch set. The other test which failed doesn't look related to this patch. I will retrigger the precommit once more.

http://gerrit.cloudera.org:8080/#/c/17848/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2716
PS7, Line 2716: does not them
> nit: missing 'include'?
Done


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1792
PS6, Line 1792:     
> This indentation is inconsistent with L1812 (I think the latter is the corr
Thanks for the detailed example. I will try to see if I can change my IDE settings to change this. manually fixed it for now.


http://gerrit.cloudera.org:8080/#/c/17848/7/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/17848/7/tests/custom_cluster/test_events_custom_configs.py@270
PS7, Line 270: batch_events_2 > batch_events_1
> I think we should convert them to integers before comparing them.
Actually I found that all the instances of get_event_processor_metric needed a int. Hence I changed the method to return the int now.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 20:59:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 11:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7509/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 11
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 30 Sep 2021 22:27:38 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Reviewed-on: http://gerrit.cloudera.org:8080/17848
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Sourabh Goyal <so...@cloudera.com>
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
M tests/metadata/test_event_processing.py
M tests/util/event_processor_utils.py
11 files changed, 939 insertions(+), 214 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Sourabh Goyal: Looks good to me, but someone else must approve
  Zoltan Borok-Nagy: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 12
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:       String serviceIdOfEvent = MetastoreEvents.getStringProperty(parametersFromEvent,
> I spent some time one this yesterday. It definitely looks like a couple of 
Makes sense. We can take it up later. Please do file a jira to track this.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Sep 2021 18:20:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@878
PS2, Line 878:    *                      when isInsertEventContext is false, it's version number to remove
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1277
PS2, Line 1277:       Preconditions.checkArgument(MetastoreEventType.CREATE_DATABASE.equals(getEventType()));
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1336
PS2, Line 1336:       Preconditions.checkArgument(MetastoreEventType.ALTER_DATABASE.equals(getEventType()));
line too long (92 > 90)


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1392
PS2, Line 1392:       Preconditions.checkArgument(MetastoreEventType.DROP_DATABASE.equals(getEventType()));
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:       return versionNumberFromEvent_ == versionNumberOfEvent && serviceIdFromEvent_.equals(
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2384
PS2, Line 2384:     mockEvents.addAll(createMockEvents(startEventId + mockEvents.size(), 1, "ALTER_PARTITION",
line too long (94 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Sep 2021 19:41:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2733
PS6, Line 2733: value for a given partition key is null
> There are 2 callers to this method. One of the callers come from the code-p
I see, thanks.


http://gerrit.cloudera.org:8080/#/c/17848/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/17848/7/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2716
PS7, Line 2716: does not them
nit: missing 'include'?


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@602
PS6, Line 602: 
> That is in fact true. Eg. in case of AlterPartitionEvent there is a partiti
I see, nevermind, let's keep the current name.


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1792
PS6, Line 1792:     
> I always get confused by this? I thought continuation indent is 4 spaces. I
This indentation is inconsistent with L1812 (I think the latter is the correct one).
AFAICT usually we don't indent lines belonging to the same text multiple times. Neither other complex expressions.
E.g. '+ "parameters..."' belongs to the text that starts at L1791 which is already indented with +4.

I think the following rule works for function calls:
* indent top-level arguments with +4 if need to use multiple lines
* indent arguments of nested function calls with additional +4, and so on.

E.g.:

 callFoo("very   ............" +
     "long       ............" +
     "first param",
     callBar(
         "argument" +
         "of" +
         "callBar",
         callDeep(9876 + 4321 +
             12345 +
             42,
             secondParamOfDeep)),
     "third argument of" +
     "callFoo");

These are not strict rules, we can use different indentation if it really makes the code more readable.

That said, if you like the current indentation better it's OK for me.


http://gerrit.cloudera.org:8080/#/c/17848/7/tests/custom_cluster/test_events_custom_configs.py
File tests/custom_cluster/test_events_custom_configs.py:

http://gerrit.cloudera.org:8080/#/c/17848/7/tests/custom_cluster/test_events_custom_configs.py@270
PS7, Line 270: batch_events_2 > batch_events_1
I think we should convert them to integers before comparing them.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 7
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 14:08:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
File fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java@878
PS2, Line 878:    *                      when isInsertEventContext is false, it's version number to remove
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1277
PS2, Line 1277:       Preconditions.checkArgument(MetastoreEventType.CREATE_DATABASE.equals(getEventType()));
> line too long (93 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1336
PS2, Line 1336:       Preconditions.checkArgument(MetastoreEventType.ALTER_DATABASE.equals(getEventType()));
> line too long (92 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1392
PS2, Line 1392:       Preconditions.checkArgument(MetastoreEventType.DROP_DATABASE.equals(getEventType()));
> line too long (91 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:       return versionNumberFromEvent_ == versionNumberOfEvent && serviceIdFromEvent_.equals(
> Would it not be more performant if we batch alter partition events for the 
Since the default value of polling is 1s and we fetch the events in the batch of 1000 from metastore, even with the existing patch a single stream of 1800 events is broken into several batches of few hundred events. There is a trade off here with the polling interval and the batch size. Higher the polling interval, bigger the batch sizes but we increase the time it takes eventually to sync all the events because we are polling at a slower rate. Hence the performance may not improve significantly since the batch sizes are always in a few hundreds based on the experiment that I ran.


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1728
PS2, Line 1728:       this.baseEvent_ = baseEvent;
> Should we assert here that baseEvent can only be either AlterPartitionEvent
Wouldn't that be inconsistent with the use of generic types here. We only need to special-case for Insert since the self-event identifiers are different in case of insert. That is handled in the getSelfEventContext() method. In future we can probably batch ALTER_TABLE and ALTER_DATABASE events too using this same class with small modification to the code.


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@2384
PS2, Line 2384:     mockEvents.addAll(createMockEvents(startEventId + mockEvents.size(), 1, "ALTER_PARTITION",
> line too long (94 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Sep 2021 18:36:23 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Sourabh Goyal (Code Review)" <ge...@cloudera.org>.
Sourabh Goyal has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:       return versionNumberFromEvent_ == versionNumberOfEvent && serviceIdFromEvent_.equals(
Would it not be more performant if we batch alter partition events for the same table from other clusters as well? While processing a batched event, we are anyways looping over events to check if they can be skipped. Can add one more check for self events and ignore those. 

The BatchPartitionEvent process() code where we are looping over. 
for (T event : batchedEvents_) {
        if (event.canBeSkipped()) {
          ignoredPartitions.add(event);
        }
 }


http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1728
PS2, Line 1728:       this.baseEvent_ = baseEvent;
Should we assert here that baseEvent can only be either AlterPartitionEvent or InsertEvent?



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Sep 2021 17:56:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9464/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Sep 2021 19:11:02 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/2/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1641
PS2, Line 1641:       return versionNumberFromEvent_ == versionNumberOfEvent && serviceIdFromEvent_.equals(
> Okay. Let me look into this and see how much additional changes will need t
I spent some time one this yesterday. It definitely looks like a couple of days worth of effort. Given that the self-event logic is going to be simplified soon with https://gerrit.cloudera.org/#/c/17756/, I am inclined to take this up once we have this change merged so that it would become much easier to implement this suggestion. Thoughts?



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 2
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Sep 2021 17:41:44 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17848/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/3/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1644
PS3, Line 1644:       return versionNumberFromEvent_ == versionNumberOfEvent && serviceIdFromEvent_.equals(
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Sep 2021 18:37:19 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
9 files changed, 919 insertions(+), 182 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
9 files changed, 920 insertions(+), 185 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................

IMPALA-9857: Batching of consecutive partition events

This patch improves the performance of events processor
by batching together consecutive ALTER_PARTITION or
INSERT events. Currently, without this patch, if
the events stream consists of a lot of consecutive
ALTER_PARTITION events which cannot be skipped,
events processor will refresh partition from each
event one by one. Similarly, in case of INSERT events
in a partition events processor refresh one partition
at a time.

By batching together such consecutive ALTER_PARTITION or
INSERT events, events processor needs to take lock on the table
only once per batch and can refresh all the partitions from
the events using multiple threads. For transactional (acid)
tables, this provides even significant performance gain
since currently we refresh the whole table in case of
ALTER_PARTITION or INSERT partition events. By batching them
together, events processor will refresh the table once per
batch.

The batch of eligible ALTER_PARTITION and INSERT events will
be processed as ALTER_PARTITIONS and INSERT_PARTITIONS event
respectively.

Performance tests:
In order to simulate bunch of ALTER_PARTITION and INSERT
events, a simple test was performed by running the following
query from hive:
insert into store_sales_copy partition(ss_sold_date_sk)
select * from store_sales;

This query generates 1824 ALTER_PARTITION and 1824 INSERT
events and time taken to process all the events generated
was measured before and after the patch for external and
ACID table.

Table Type              Before          After
======================================================
External table          75 sec          25 sec
ACID tables             313 sec         47 sec

Additionally, the patch also fixes a minor bug in
evaluateSelfEvent() method which should return false when
serviceId does not match.

Testing Done:
1. Added new tests which cover the batching logic of events.
2. Exhaustive tests.

Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
---
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/HdfsPartition.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/catalog/events/InFlightEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
M fe/src/main/java/org/apache/impala/catalog/events/MetastoreEventsProcessor.java
M fe/src/main/java/org/apache/impala/catalog/events/SelfEventContext.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
M tests/custom_cluster/test_events_custom_configs.py
11 files changed, 932 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/48/17848/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 4
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9522/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 10
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 28 Sep 2021 22:02:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9465/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 5
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Comment-Date: Thu, 16 Sep 2021 19:11:51 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/9496/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 23 Sep 2021 23:33:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 9: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7487/


-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 9
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 24 Sep 2021 05:29:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-9857: Batching of consecutive partition events

Posted by "Yu-Wen Lai (Code Review)" <ge...@cloudera.org>.
Yu-Wen Lai has posted comments on this change. ( http://gerrit.cloudera.org:8080/17848 )

Change subject: IMPALA-9857: Batching of consecutive partition events
......................................................................


Patch Set 6:

(2 comments)

Thanks Vihang for introducing a way for batch event processing. I will rebase my patch for IMPALA-10923 on top of this patch.

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java
File fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java:

http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@270
PS6, Line 270: i=0, j=1
nit: spaces around assignment operator


http://gerrit.cloudera.org:8080/#/c/17848/6/fe/src/main/java/org/apache/impala/catalog/events/MetastoreEvents.java@1803
PS6, Line 1803:         for (T event : batchedEvents_) {
It seems ignoredPartitions still being processed here.



-- 
To view, visit http://gerrit.cloudera.org:8080/17848
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5d27a68a64436d31731e9a219b1efd6fc842de73
Gerrit-Change-Number: 17848
Gerrit-PatchSet: 6
Gerrit-Owner: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Quanlong Huang <hu...@gmail.com>
Gerrit-Reviewer: Sourabh Goyal <so...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Yu-Wen Lai <yu...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 22 Sep 2021 03:55:41 +0000
Gerrit-HasComments: Yes