You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org> on 2021/04/13 18:56:39 UTC

[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

Csaba Ringhofer has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17313


Change subject: WIP: IMPALA-10656: Fire insert events before commit
......................................................................

WIP: IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were alrady collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Icebert tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests
- I wasn't able to run EE tests yet, this is the reason behind the poc
  state
- done some basic manual testing

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 225 insertions(+), 216 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8610/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Apr 2021 20:50:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#13).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 247 insertions(+), 226 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#9).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 225 insertions(+), 215 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 9
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/7/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/7/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS7, Line 879:             unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Apr 2021 07:44:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7066/ DRY_RUN=true


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 18:57:16 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#11).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 233 insertions(+), 224 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8572/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:43:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS6, Line 879:             unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:03:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/4/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS4, Line 879:             unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:22:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8637/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Apr 2021 14:19:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8574/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 13:20:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 14: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Apr 2021 18:57:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#5).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 220 insertions(+), 216 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 3: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7066/


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 00:45:26 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 12: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4876
PS12, Line 4876: These ACID_WRITE events are collected by HMS and become
               :    * visible during commit
I see that you added this. But I am not sure if this is correct. HMS metadata is not transactional and hence it has no awareness of when the events needs to be made visible. The events will be visible as soon as the HMS API succeeds AFAIK. It is upto the clients to determine if the ACID_WRITE event need to be processed or not based on the fact that the writeId in the event is committed or not. This commit or abort also generated a COMMIT_TXN or ABORT_TXN. But now may be we are going into implementation details.



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Apr 2021 22:30:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4902
PS8, Line 4902:       if (!isTransactional) {
              :         // add_partitions() RPC have already fired events for new partitions in the
              :         // non-transactional case.
              :         partSet = new HashSet<String>(partSet);
              :         partSet.removeAll(addedPartitionNames);
              :       }
> It is also not clear to me, but we worked like this before this patch. This
Okay, It took me a bit of poking around in Hive code to understand if we are doing something unusual here. I think this difference is worth documenting in a comment since it can be confusing otherwise. I added some comments about improvements to the comments.



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Apr 2021 22:12:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 14: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 27 Apr 2021 00:41:03 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4672
PS8, Line 4672: and
nit, s/and/an


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4868
PS8, Line 4868:    * @param partitionFilesMapBeforeInsert List of files for each partition in which data
              :    *                                     was inserted.
              :    * @param affectedExistingPartitions List of all affected partitions that were already
              :    *                                   existed.
              :    * @param newPartsCreated represents all the partitions names which got created by
              :    *                       the insert operation.
              :    * @param isInsertOverwrite indicates if the operation was an insert overwrite.
can you update the comment with the new params?


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4902
PS8, Line 4902:       if (!isTransactional) {
              :         // add_partitions() RPC have already fired events for new partitions in the
              :         // non-transactional case.
              :         partSet = new HashSet<String>(partSet);
              :         partSet.removeAll(addedPartitionNames);
              :       }
It is not clear to me why we do this for non-transactional tables. IIUC addedPartitionNames represents the newly added partitions. These partitions do not need a INSERT event irrespective of whether the table is transactional or not since HMS will generate a ADD_PARTITION event when we added them on line 4740. So this block will need to be done both transactional and non-transactional table right?


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4912
PS8, Line 4912: get("")
I think it is worth a comment here to mention that in case of unpartitioned table the key is "" and may be provide the method name in the backend as reference.


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4916
PS8, Line 4916: Preconditions
nit, move the preconditions above line 4914.


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4926
PS8, Line 4926:       Preconditions.checkState(!newFiles.isEmpty() || isInsertOverwrite);
              :       Preconditions.checkState(!partVals.isEmpty());
nit, move the preconditions check before line 4924



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Apr 2021 15:56:04 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/3/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS3, Line 879:             unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 18:57:27 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 14:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7101/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 14
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Apr 2021 18:57:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/5/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS5, Line 879:             unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()), overwrite,
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:24:00 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 8:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG@17
PS6, Line 17: 
> nit: some lines are longer than 72 chars
Done


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc@484
PS6, Line 484: current_file
> +1 for the rename :)
thx :)


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h@63
PS6, Line 63: all this for an existin
> nit: could you please mention this parameter in the comment?
Done


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc@496
PS6, Line 496: }
> nit: add extra line
Done


http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
File fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java:

http://gerrit.cloudera.org:8080/#/c/17313/6/fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java@879
PS6, Line 879:             unpartTable.getFileSystem(), new Path(unpartTable.getHdfsBaseDir()),
> line too long (91 > 90)
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Apr 2021 07:47:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4876
PS12, Line 4876: 
               :    * 2. If the table is no
> I see that you added this. But I am not sure if this is correct. HMS metada
Removed this sentence. I definitely don't want to became a source of false information and I agree that this is implementation details.

It would be the best if there was a public document somewhere that describes these concepts and we could link it in situations like this.



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Apr 2021 14:02:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8611/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Apr 2021 20:56:54 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 11:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4672
PS8, Line 4672: an 
> nit, s/and/an
Done


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4868
PS8, Line 4868:    * insert.
              :    * @param table The target table.
              :    * @param updatedPartitions All affected partitions with the list of new files
              :    *                          inserted.
              :    * @param addedPartitionNames List of new partitions created during the insert.
              :    * @param isInsertOverwrite Indicates if the operation was an insert overwrite.
              :    * @param tblTxn Contains the transactionId and the writeId for the insert.
> can you update the comment with the new params?
Done


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4902
PS8, Line 4902:       if (!isTransactional) {
              :         // add_partitions() RPC have already fired events for new partitions in the
              :         // non-transactional case.
              :         partSet = new HashSet<String>(partSet);
              :         partSet.removeAll(addedPartitionNames);
              :       }
> It is not clear to me why we do this for non-transactional tables. IIUC add
It is also not clear to me, but we worked like this before this patch. This is both mentioned in comments and code works this way if I understood correctly.

See line 4843 in the original commit - filesBeforeInsert only contains partitions which are not new as we only load affectedExistingPartitions.
Then again in line 4901 we only load partitions that have already existed to partsPostInsert, and then generate the events based on partsPostInsert, and later fire events for the new partitions only in the transactional case.


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4912
PS8, Line 4912: ition w
> I think it is worth a comment here to mention that in case of unpartitioned
Done. This is the way we "name" them in general, not just in BE.


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4916
PS8, Line 4916: Preconditions
> nit, move the preconditions above line 4914.
Done


http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4926
PS8, Line 4926:       Preconditions.checkState(!newFiles.isEmpty() || isInsertOverwrite);
              :       Preconditions.checkState(!partVals.isEmpty());
> nit, move the preconditions check before line 4924
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Apr 2021 20:48:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4881
PS12, Line 4881:    * https://github.com/apache/hive/blob/25892ea409/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3251
line too long (114 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Apr 2021 15:01:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 4:

PS 4 fixes the issue found during the jenkins tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:21:19 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 6: Code-Review+1

(4 comments)

Looked at the backend, the change looks great!

http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17313/6//COMMIT_MSG@17
PS6, Line 17: d,
nit: some lines are longer than 72 chars


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/exec/hdfs-table-sink.cc@484
PS6, Line 484: current_file
+1 for the rename :)


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h
File be/src/runtime/dml-exec-state.h:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.h@63
PS6, Line 63: staging_dir_to_clean_up
nit: could you please mention this parameter in the comment?


http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc
File be/src/runtime/dml-exec-state.cc:

http://gerrit.cloudera.org:8080/#/c/17313/6/be/src/runtime/dml-exec-state.cc@496
PS6, Line 496: }
nit: add extra line



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 16:12:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8591/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Apr 2021 08:01:55 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8571/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 14 Apr 2021 10:40:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 225 insertions(+), 216 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#6).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the table
from HMS, and generated the insert events based on the difference between
the two snapshots. (e.g. which file was not present in the old snapshot
but are there in the new one).

Hive replication expects the insert events before the commit, so this may
potentially lead to issues there.

The solution is to collect the new files during the insert in the backend,
and send the insert events based on this file set. This wasn't very hard
to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 219 insertions(+), 215 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#7).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 222 insertions(+), 215 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 8: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 19 Apr 2021 13:08:43 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 11:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17313/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4867
PS11, Line 4867: of non-transactional tables if there are no existing partitions affected by this
               :    * insert.
Can you add a comment about the behavior of the transactional tables here.
May be adding something like below will be useful.

"This method is replicating what Hive does in case a table or partition is inserted into. There are 2 cases:
1. If the table is transactional, we should first generate ADD_PARTITION events for new partitions which are generated. This is taken care of in the updateCatalog method. Additionally, for each partition including existing partitions which were inserted into, this method creates a ACID_WRITE event using the HMS API addWriteNotificationLog.
2. If the table is not transactional, this method generates INSERT_EVENT for only the pre-existing partitions which were inserted into. This is in-line with what hive does (see https://github.com/apache/hive/blob/25892ea409b3351acbd409d1f66f399a43614798/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3251)


http://gerrit.cloudera.org:8080/#/c/17313/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4885
PS11, Line 4885: insert event type.
Can you please add the following to the comment -- "This would show up in HMS as a ACID_WRITE event"



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 22 Apr 2021 22:10:41 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 12:

(3 comments)

Thanks Vihang for the detailed comments in the comments :) I only made small changes to your suggestions.

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/8/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4902
PS8, Line 4902:         .format("Invalid transaction id %s for generating insert events on table %s",
              :             txnId, table.getFullName()));
              :     Preconditions.checkState(!isTransactional || writeId > 0,
              :         String.format("Invalid write id %s for generating insert events on table %s",
              :             writeId, table.getFullName()));
              : 
> Okay, It took me a bit of poking around in Hive code to understand if we ar
Done


http://gerrit.cloudera.org:8080/#/c/17313/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4867
PS11, Line 4867:   TODO: I am not sure that it is the right thing to connect event polling and
               :    *        
> Can you add a comment about the behavior of the transactional tables here.
Done


http://gerrit.cloudera.org:8080/#/c/17313/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4885
PS11, Line 4885: aram addedPartitio
> Can you please add the following to the comment -- "This would show up in H
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Apr 2021 15:02:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 12:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/12/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4876
PS12, Line 4876: These ACID_WRITE events are collected by HMS and become
               :    * visible during commit
> Removed this sentence. I definitely don't want to became a source of false 
Thanks.



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Apr 2021 17:49:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Reviewed-on: http://gerrit.cloudera.org:8080/17313
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 247 insertions(+), 226 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 15
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: WIP: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8565/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 13 Apr 2021 19:17:41 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#10).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 225 insertions(+), 215 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 10
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 11:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8612/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 20 Apr 2021 20:56:36 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 12:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8631/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 23 Apr 2021 15:21:01 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#8).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 222 insertions(+), 215 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Vihang Karajgaonkar (Code Review)" <ge...@cloudera.org>.
Vihang Karajgaonkar has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 13: Code-Review+2

+1 and carrying forward Zoltan's +1 from earlier.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Apr 2021 17:49:40 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17313/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/17313/13/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@4880
PS13, Line 4880:    * https://github.com/apache/hive/blob/25892ea409/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L3251
line too long (114 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 13
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 26 Apr 2021 14:01:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Hello Vihang Karajgaonkar, Zoltan Borok-Nagy, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17313

to look at the new patch set (#12).

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................

IMPALA-10656: Fire insert events before commit

Before this fix Impala committed an insert first, then reloaded the
table from HMS, and generated the insert events based on the difference
between the two snapshots. (e.g. which file was not present in the old
snapshot but are there in the new one).

Hive replication expects the insert events before the commit, so this
may potentially lead to issues there.

The solution is to collect the new files during the insert in the
backend, and send the insert events based on this file set. This wasn't
very hard to do as we were already collecting the files in some cases:
- to move them from staging dir to their final location in case of
  non-partitioned tables
- to write the file list to snapshot files in case of Iceberg tables
This patch unifies the paths above and collects all information about
the created files regardless of the table type.

Testing:
- no new tests, insert events were already covered in
  test_event_processing.py and MetastoreEventsProcessorTest.java
- ran core tests

Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
---
M be/src/exec/hbase-table-sink.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-text-table-writer.cc
M be/src/exec/output-partition.h
M be/src/exec/parquet/hdfs-parquet-table-writer.cc
M be/src/runtime/dml-exec-state.cc
M be/src/runtime/dml-exec-state.h
M be/src/service/client-request-state.cc
M common/protobuf/control_service.proto
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/catalog/events/MetastoreEventsProcessorTest.java
12 files changed, 248 insertions(+), 226 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/13/17313/12
-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 12
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-10656: Fire insert events before commit

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17313 )

Change subject: IMPALA-10656: Fire insert events before commit
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8592/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17313
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2ed812dbcb5f55efff3a910a3daeeb76cd3295b9
Gerrit-Change-Number: 17313
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Vihang Karajgaonkar <vi...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 16 Apr 2021 08:05:54 +0000
Gerrit-HasComments: No