You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org> on 2019/06/07 16:33:11 UTC

[Impala-ASF-CR] WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Zoltan Borok-Nagy has uploaded this change for review. ( http://gerrit.cloudera.org:8080/13559


Change subject: WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Initial prototype of INSERT into ACID tables.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
15 files changed, 144 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/1
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 15: Code-Review+2

(3 comments)

Only minor comments.

http://gerrit.cloudera.org:8080/#/c/13559/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/15//COMMIT_MSG@17
PS15, Line 17: The write id is also set the for the HDFS table sinks. The sinks write the
             : files at their final destination which is an ACID base or delta directory.
nit: please wrap at 72


http://gerrit.cloudera.org:8080/#/c/13559/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/15/be/src/service/client-request-state.cc@1116
PS15, Line 1116:       Status debug_action_status = DebugAction(
               :           query_options(), "CLIENT_REQUEST_UPDATE_CATALOG");
               :       Status status = !debug_action_status.ok() ? debug_action_status :
               :           client.DoRpc(
               :               &CatalogServiceClientWrapper::UpdateCatalog, catalog_update, &resp);
nit: could be bit simpler IMO as

Status status = DebugAction(query_options(), "CLIENT_REQUEST_UPDATE_CATALOG");
if (status.ok()) {
  status = client.DoRpc(
              &CatalogServiceClientWrapper::UpdateCatalog, catalog_update, &resp);
}


http://gerrit.cloudera.org:8080/#/c/13559/15/testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test:

http://gerrit.cloudera.org:8080/#/c/13559/15/testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test@1
PS15, Line 1: ====
It would be also useful to test (dynamic) partitioned inserts with different failure modes. I'am ok with doing this in a different patch.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jul 2019 20:10:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 12:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/3976/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Jul 2019 16:42:33 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 5:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG@36
PS5, Line 36: TODO in following commits:
It is not clear to me whether dynamic partitioning is supported or not.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@214
PS5, Line 214:   // Create final_hdfs_file_name_prefix and tmp_hdfs_file_name_prefix.
Can you mention the transactional naming logic in the comment?


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@696
PS5, Line 696:   return (IsS3APath(partition->final_hdfs_file_name_prefix.c_str()) && !overwrite_ &&
             :       state->query_options().s3_skip_insert_staging) ||
             :       IsTransactional();
nit: can you improve readability? e.g if (IsTransactional()) return true; could go to a separate line.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817
PS5, Line 817:     if (exec_request().__isset.transaction_id) {
             :       RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id));
             :       in_transaction_ = false;
             :     }
             :     RETURN_IF_ERROR(UpdateCatalog());
I am not sure here, but the order of UpdateCatalog() and CommitTransaction() seems non-trivial to me, especially with dynamic partitioning - I assume that UpdateCatalog() will create the new partitions in HMS, and in that case we can commit to partitions that do not exist yet in HMS.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1259
PS5, Line 1259:     try {
I think it would be more readable if the logic inside the try block would be in a separate function.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1370
PS5, Line 1370: ImpalaException
Should this only catch ImpalaException? e.g. if there is a null pointer exception for some reason, we should still abort the transaction in my opinion.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1372
PS5, Line 1372: abortTransaction
This can also throw an exception, for example if HMS was shut down, and this will hide the original exception.

I don't know how do we generally handle this in Impala - in this case, TransactionException could get a member like Exception causeOfAbort_ or originalException_.


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql@2611
PS5, Line 2611: TBLPROPERTIES('transactional'='true', 'transactional_properties'='insert_only');;
nit: extra ;


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@7
PS5, Line 7: ---- QUERY
It would be great to also read the tables back with HMS.


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@113
PS5, Line 113: ====
Can you also add insert into/overwrite with dynamic partitioning?


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py@662
PS5, Line 662: execute_serially
As this test runs in a unique_database, I don't think that it needs to run serially. E.g. the next test needs this because it has INVALIDATE METADATA, which is a global operation.


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@142
PS5, Line 142:   def test_acid_insert(self, vector):
What is the reason for not using a unique database?


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@146
PS5, Line 146:     if HIVE_MAJOR_VERSION >= 3:
@SkipIfHive2.acid already ensures that HIVE_MAJOR_VERSION >= 3.


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@147
PS5, Line 147:       # This client doesn't have the capability for ACID tables, but we only need
             :       # to drop and create such tables, and table properties are preserved during
             :       # those operations.
             :       capability_check = self.hive_client.getMetaConf("metastore.client.capability.check")
             :       self.hive_client.setMetaConf("metastore.client.capability.check", "false")
I do not understand why this is needed.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 19 Jun 2019 12:44:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 20: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 20
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 27 Jul 2019 13:45:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3629/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 14 Jun 2019 17:19:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 13:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/3994/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jul 2019 11:21:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/13559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@410
PS1, Line 410:     transactionId_ = queryCtx.isSetTransaction_id() ? queryCtx.getTransaction_id() : null; 
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@410
PS1, Line 410:     transactionId_ = queryCtx.isSetTransaction_id() ? queryCtx.getTransaction_id() : null; 
line too long (91 > 90)



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 16:33:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#3).

Change subject: WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Initial prototype of INSERT into ACID tables.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
18 files changed, 232 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/3
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#2).

Change subject: WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Initial prototype of INSERT into ACID tables.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
16 files changed, 180 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/2
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#19).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write
the files at their final destination which is an ACID base or delta
directory. There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
36 files changed, 943 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/19
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 19
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3734/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Jun 2019 14:51:53 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#10).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
25 files changed, 731 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/10
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13559/6/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/13559/6/tests/query_test/test_insert.py@26
PS6, Line 26: from tests.common.environ import HIVE_MAJOR_VERSION
flake8: F401 'tests.common.environ.HIVE_MAJOR_VERSION' imported but unused


http://gerrit.cloudera.org:8080/#/c/13559/6/tests/query_test/test_insert.py@155
PS6, Line 155: #
flake8: E116 unexpected indentation (comment)



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Jun 2019 14:43:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#12).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following PS of this commit:
* add tests for error paths via debug actions

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
29 files changed, 837 insertions(+), 125 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/12
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 11:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/3816/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jul 2019 15:07:49 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 10:

(13 comments)

I had to refresh my memory on how the query lifecycle worked for inserts, but I think I paged enough back in to have some valid thoughts on it.

The biggest problem I think is the one that you identified - that the transaction commit, Impala catalog and HMS updates are not atomic from  the point of view of Impala, so we're opening ourselves up to various anomalies. I haven't really thought through exactly what anomalies are possible, but it would be good to avoid them entirely, if possible.

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h@264
PS10, Line 264: long
int64_t?


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@500
PS10, Line 500:   // However, for transactional tables we should create a new empty base directory.
Why? I assume there is some good reason but it's not immediately obvious to me.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@700
PS10, Line 700:   if (IsTransactional()) return true;
This one seems more obvious to me, but makes me think that the class comment should have a brief expectation of the directory layout and behaviour of ACID inserts. Or a pointer to something that explains it.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571
PS10, Line 571: FinalizeHdfsInsert
We should maybe rename this to FinalizeHdfsDml(), now or later.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571
PS10, Line 571: Status Coordinator::FinalizeHdfsInsert() {
I think we should probably do the transaction abort in this function, since it will happen asynchronously and not depend on the client unregistering the query. I think it fits conceptually with removing the staging directory and that cleanup.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@307
PS10, Line 307:   /// True if there is an open transaction.
Hive ACID transaction, just to be clear.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@308
PS10, Line 308:   bool in_transaction_ = false;
I think this would best fit in DmlExecState.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@720
PS10, Line 720:     DCHECK(exec_request().__isset.transaction_id);
I think we would prefer to abort the transaction earlier in the query lifecycle. Query unregistration does not necessarily happen in a timely fashion because it depends on client RPCs. Coordinator::FinalizeHdfsInsert() is maybe the right place.

This could also be a helper, maybe - Done() is getting to the point where it doesn't fit on a screen.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@820
PS10, Line 820:       //TODO: HMS is not updated at this point, only in UpdateCatalog(). But
Zoli explained this to me out-of-band. I think I agree that having the catalog commit the transaction is the right approach, since we'd want transaction commit and the Impala catalog update to be an atomic operation from the point-of-view of impalads.

It looks like there's an add_dynamic_partitions() method that takes a transaction ID, so I think in theory we could solve that partition creation problem by having the impalad create the partitions in the transaction, but that would still leave a window of inconsistency.

It's a little asymmetrical to have the Impala start the transaction and catalogd commit it, but it seems less weird than the consistency issues.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h
File be/src/service/frontend.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@170
PS10, Line 170:   /// Commits transaction with the given transaction id.
Are there any invariants we should document for these methods? I guess we're just assuming that this coordinator had opened a transaction previous.

Are there any interesting failure modes to document?


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@171
PS10, Line 171: long
I think we want to use int64_t, to match the thrift definition and our style guide.


http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@179
PS10, Line 179:   private Long writeId_;
This is just a nit, but it would be slightly more intuitive to me to use -1 here to represent not set, since that's what's used in the backend for write IDs and generally elsewhere in the frontend for similar cases (cardinality estimates, etc).


http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test:

http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@3
PS10, Line 3: RESET insertonly_nopart_insert
We've generally preferred using unique_database for tables that are mutated as part of tests so that the tests can run in parallel. The execute_serially thing in test_insert is a bit of a legacy thing.

Not a blocker, or maybe I'm missing a reason not to use unique_database.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jun 2019 23:46:56 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write
the files at their final destination which is an ACID base or delta
directory. There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Reviewed-on: http://gerrit.cloudera.org:8080/13559
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
36 files changed, 943 insertions(+), 137 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 21
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 15:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/13559/15//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/15//COMMIT_MSG@17
PS15, Line 17: The write id is also set the for the HDFS table sinks. The sinks write the
             : files at their final destination which is an ACID base or delta directory.
> nit: please wrap at 72
Done


http://gerrit.cloudera.org:8080/#/c/13559/15/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/15/be/src/service/client-request-state.cc@1116
PS15, Line 1116:       Status debug_action_status = DebugAction(
               :           query_options(), "CLIENT_REQUEST_UPDATE_CATALOG");
               :       Status status = !debug_action_status.ok() ? debug_action_status :
               :           client.DoRpc(
               :               &CatalogServiceClientWrapper::UpdateCatalog, catalog_update, &resp);
> nit: could be bit simpler IMO as
Done


http://gerrit.cloudera.org:8080/#/c/13559/15/testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test:

http://gerrit.cloudera.org:8080/#/c/13559/15/testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test@1
PS15, Line 1: ====
> It would be also useful to test (dynamic) partitioned inserts with differen
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 15:56:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 12: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Jul 2019 23:06:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#16).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write the
files at their final destination which is an ACID base or delta directory.
There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
36 files changed, 898 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/16
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 16
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 11:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@25
PS11, Line 25: The Backend commits/aborts the transaction by calling the Frontend via
I think this needs to be updated to explain what is done on the catalog.


http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@34
PS11, Line 34:   tests only run with exhaustive exploration strategy)
I'm thinking about how we can test more of the error paths, that seems like the biggest gap at the moment. It might be worth adding cancellation tests like TestCancellationSerial.test_cancel_insert() that cancels a query against an ACID table, and I suppose also one that cancels a select query against an acid table.

Ideally we would also test some of the other error paths (e.g. HMS returns error) - I think we'd have to add failpoints for that. But I don't want to expand the scope of this too much - maybe take a look to see what it would take to do that. Don't think this is a blocker.


http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@37
PS11, Line 37: * add locks and heartbeats
What are the implications of not having heartbeats? Will long-running queries against ACID tables have the transaction time out on the HMS?


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/exec/hdfs-table-sink.h@134
PS11, Line 134: /
"base or delta". It was confusing whether it was a path separate or or not


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.h@493
PS11, Line 493:   /// Gather and publish all required updates to the metastore
Can you document the postcondition around transactions being committed/aborted? It seems like we're guaranteeing that DML transactions on partitioned tables should be committed or aborted when this function returns.


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc@1106
PS11, Line 1106:       Status rpc_status =client.DoRpc(&CatalogServiceClientWrapper::UpdateCatalog,
missing space


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc@1107
PS11, Line 1107:           catalog_update, &resp);
It would be more explicit to
  if (!rpc_status.ok()) {
   if (InTransaction()) AbortTransaction();
   return rpc_status;
  }

Actually could maybe combine the checks to reduce the number of error paths like:

  Status status = client.DoRpc();
  if (status.ok()) status = Status(resp.result.status);
  if (!status.ok()...) {
    if (InTransaction()) AbortTransaction();
    LOG(ERROR) << "ERROR Finalizing DML: " << status.GetDetail();
    return status;
  }

This has a slight behaviour change of logging RPC errors, but that seems like a reasonable thing to do. This is a bit of a nit-pick, and your code is correct, but this kind of error handling is easy to introduce bugs into. I think the current code is ok, will leave it to you to determine if it's an improvement or not.


http://gerrit.cloudera.org:8080/#/c/13559/11/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/13559/11/common/thrift/Frontend.thrift@361
PS11, Line 361: traget
target


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@941
PS11, Line 941:         //TODO writeIDs may also be loaded in other code paths.
I guess you didn't actually add this comment, is there a JIRA for this?


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java
File fe/src/main/java/org/apache/impala/common/TransactionException.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java@21
PS11, Line 21:  * Thrown for errors related to ACID transactions.
I feel like this could be a little more specific, e.g. "Thrown for errors that occur when interacting with ACID transactions, e.g. failures to open, commit, or abort a transaction". I.e. it wouldn't include something like failing a query because it uses an unsupported table type or something.


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java@22
PS11, Line 22:  *
unnecesssary blank line?


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3801
PS11, Line 3801: failers
failures


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3803
PS11, Line 3803:       // moved to ClientRequestState::UpdateCatalog().
I think handling the abort on the client is probably the right thing to do anyway, because we want to abort the transaction if there's an RPC error talking to the catalog anyway. So we could actually just say "The client, i.e. query coordinator, is always responsible for aborting transactions when queries hit errors."


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1266
PS11, Line 1266:       //TODO: should load table write ids in transaction context.
is there a jira?


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1379
PS11, Line 1379:         catch (TransactionException te) {
nit: extra newline before catch


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1629
PS11, Line 1629:   private static boolean stmtNeedsTransaction(AnalysisResult analysisResult) {
The name/comment is a bit misleading, since it depends on whether the query is referencing an ACID table or not.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 18 Jul 2019 04:17:05 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#11).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
28 files changed, 765 insertions(+), 126 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/11
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 3:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/3578/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 3
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Wed, 12 Jun 2019 17:15:45 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 1:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/3536/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 17:15:23 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 18:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4030/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 18
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 19:45:24 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 19:

CDP build ran successfully: https://jenkins.impala.io/job/ubuntu-16.04-from-scratch-cdp-hive/34/

Invoking GVO.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 19
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 27 Jul 2019 07:14:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 10:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.h@264
PS10, Line 264: long
> int64_t?
Done


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@500
PS10, Line 500:   // However, for transactional tables we should create a new empty base directory.
> Why? I assume there is some good reason but it's not immediately obvious to
Sorry, I left out some information. We only need to do that in case of INSERT OVERWRITEs with empty result sets. For insert-only ACID tables we don't delete old files, we just create a new base directory.

Extended the comment and modified the condition.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/exec/hdfs-table-sink.cc@700
PS10, Line 700:   if (IsTransactional()) return true;
> This one seems more obvious to me, but makes me think that the class commen
Extended the class comment.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571
PS10, Line 571: Status Coordinator::FinalizeHdfsInsert() {
> I think we should probably do the transaction abort in this function, since
Thanks, done.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/runtime/coordinator.cc@571
PS10, Line 571: FinalizeHdfsInsert
> We should maybe rename this to FinalizeHdfsDml(), now or later.
Done


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@307
PS10, Line 307:   /// True if there is an open transaction.
> Hive ACID transaction, just to be clear.
Since then removed this bool because it caused more issues than solved because the handover of the transaction ownership between the FE and BE should happen immediately.

Now I'm using a member of FinalizeParams to detect whether we are in a transaction or not although it might not be the cleanest solution.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.h@308
PS10, Line 308:   bool in_transaction_ = false;
> I think this would best fit in DmlExecState.
We'll have transactions for queries as well.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@720
PS10, Line 720:     DCHECK(exec_request().__isset.transaction_id);
> I think we would prefer to abort the transaction earlier in the query lifec
Thanks for the suggestion, moved it to FinalizeHdfsDml()


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@820
PS10, Line 820:       //TODO: HMS is not updated at this point, only in UpdateCatalog(). But
> Zoli explained this to me out-of-band. I think I agree that having the cata
Added commit to catalog. Was also thinking about adding abort as well, but the error handling in CatalogOpExecutor.UpdateCatalog() was inconsistent (sometimes an exception is thrown, sometimes we set an error in the result object), and in some cases we still need to abort the transaction from the BE.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h
File be/src/service/frontend.h:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@170
PS10, Line 170:   /// Commits transaction with the given transaction id.
> Are there any invariants we should document for these methods? I guess we'r
Removed it since now we commit through the Catalog.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/frontend.h@171
PS10, Line 171: long
> I think we want to use int64_t, to match the thrift definition and our styl
Done. I think I got used to using longs in Java code.


http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/13559/10/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@179
PS10, Line 179:   private Long writeId_;
> This is just a nit, but it would be slightly more intuitive to me to use -1
Done


http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test:

http://gerrit.cloudera.org:8080/#/c/13559/10/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@3
PS10, Line 3: RESET insertonly_nopart_insert
> We've generally preferred using unique_database for tables that are mutated
I followed the pattern of TestInsertQueries.test_insert().

Also, this way we can easily test different file formats, whereas being able to create tables with "STORED AS $FILEFORMAT" clause requires a bit additional work for the .test files AFAICT.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 03 Jul 2019 14:28:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#6).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
25 files changed, 714 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/6
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@34
PS11, Line 34: 
> It's a good idea. I'm planning to add such tests in the following PS. Until
I added some failure tests with debug actions.


http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/10/be/src/service/client-request-state.cc@720
PS10, Line 720:   UpdateEndTime();
> Thanks for the suggestion, moved it to FinalizeHdfsDml()
We have quite a lot RETURN_IF_ERRORs, so I added the abort here again as a finalization if something went wrong.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jul 2019 10:55:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 14:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/3995/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jul 2019 11:14:27 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#18).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write
the files at their final destination which is an ACID base or delta
directory. There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
36 files changed, 942 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/18
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 18
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Tim Armstrong (Code Review)" <ge...@cloudera.org>.
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 15: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 00:30:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 17: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 17
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 18:40:34 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3996/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 25 Jul 2019 12:15:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 7:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817
PS5, Line 817:     RETURN_IF_ERROR(GetCoordinator()->Wait());
             :     if (exec_request().__isset.transaction_id) {
             :       RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id));
             :       in_transaction_ = false;
             :     }
> UpdateCatalog() won't load the new write ids and won't see the new files un
I am leaning towards doing UpdateCatalog() first:
- UpdateCatalog() seems a more complex operation, so there is probably a bigger chance for failure.
- If UpdateCatalog() succeeds but the commit fails, then the only side effect can be the creation of some empty partitions (insert overwrite doesn't delete partitions, as we checked together), which does not change the contents of the table. On the other hand, commit can add files to both old and new partitions, so if adding the new partitions fails, then the insert becomes "partially successful", which is not ok in the ACID world.
- It seems logical to do "commit" as late as possible, when there are not a lot of things that can go wrong. Getting an error for your INSERT query but still committing rows is weird.


http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@722
PS6, Line 722:     in_transaction_ = false;
Can you add query_events_->MarkEvent("Transaction aborted");?


http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@820
PS6, Line 820:       in_transaction_ = false;
Can you add query_events_->MarkEvent("Transaction committed");?


http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift
File common/thrift/DataSinks.thrift:

http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift@81
PS6, Line 81: write_id
Here and other similar variables: I would consider adding "acid_" or "transactional_" to make these variables' role clearer to people who don't know Hive acid terminology.


http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@677
PS6, Line 677:    * Parses and plans a query in order to generate its explain string. This method does
I wonder how to handle EXPLAIN - the backend won't be there to commit/abort the query. It would be probably the best to avoid opening a transaction/getting new write IDs when executing EXPLAIN.


http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1257
PS6, Line 1257: 
Can you add timeline.markEvent("Transaction opened");?

Adding transaction related events to the timeline will make it clear from the profile whether the query was treated as  transactional or not.

On the other hand, if we are doing this before analyzeAndAuthorize(), then we give a user (without the right to access a transactional table) the possibility to "probe" for the table's existence, as the profile will be different after select * from tblname if tblname exists and is transactional.


http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java@1378
PS7, Line 1378:           abortTransaction(queryCtx.getTransaction_id());
This may be out of the scope of this patch, but I would consider making abortTransaction() async - so another thread would do the abort calls, and we would just add a transaction ID to its queue here.

The reason is that doing abort a bit later doesn't seem problematic to me, while adding further delay to a failed query can be frustrating + it would allow us to abort a query later even if HMS is unresponsive at the moment.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 7
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Mon, 24 Jun 2019 10:21:20 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#13).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write the
files at their final destination which is an ACID base or delta directory.
There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
30 files changed, 889 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/13
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3612/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 15:40:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#4).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new tables

TODO in this commit:
* add comments
* add Hive interop tests

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/query_test/test_insert.py
24 files changed, 530 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/4
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3733/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Jun 2019 14:44:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#8).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
25 files changed, 724 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/8
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4032/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 19
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 20:35:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 12:

(16 comments)

http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@25
PS11, Line 25: Testing:
> I think this needs to be updated to explain what is done on the catalog.
Done


http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@34
PS11, Line 34: * add tests for error paths via debug actions
> I'm thinking about how we can test more of the error paths, that seems like
It's a good idea. I'm planning to add such tests in the following PS. Until that I added a TODO to the commit msg.


http://gerrit.cloudera.org:8080/#/c/13559/11//COMMIT_MSG@37
PS11, Line 37: * add locks and heartbeats (without heartbeats long-running transactions
> What are the implications of not having heartbeats? Will long-running queri
Yes, they will be aborted by HMS. Extended the commit message with this information.


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/exec/hdfs-table-sink.h
File be/src/exec/hdfs-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/exec/hdfs-table-sink.h@134
PS11, Line 134:  
> "base or delta". It was confusing whether it was a path separate or or not
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.h@493
PS11, Line 493:   Status FetchRowsInternal(const int32_t max_rows, QueryResultSet* fetched_rows)
> Can you document the postcondition around transactions being committed/abor
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc@1106
PS11, Line 1106:       Status status = client.DoRpc(&CatalogServiceClientWrapper::UpdateCatalog,
> missing space
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/be/src/service/client-request-state.cc@1107
PS11, Line 1107:           catalog_update, &resp);
> It would be more explicit to
Yeah, I think your suggestion makes the code cleaner. Thanks, done.


http://gerrit.cloudera.org:8080/#/c/13559/11/common/thrift/Frontend.thrift
File common/thrift/Frontend.thrift:

http://gerrit.cloudera.org:8080/#/c/13559/11/common/thrift/Frontend.thrift@361
PS11, Line 361: target
> target
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@941
PS11, Line 941:         loadValidWriteIdList(client);
> I guess you didn't actually add this comment, is there a JIRA for this?
No, I didn't write it and there is no Jira for that. I consulted with the author and this comment was just a reminder to do some investigation if whether we need to load the write ids at other code paths. So the text of it meant to be 'may also need to be loaded' I think.

I removed this line because it's not an exact TODO.


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java
File fe/src/main/java/org/apache/impala/common/TransactionException.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java@21
PS11, Line 21:  * Thrown for errors that occur when interacting with ACID transactions,
> I feel like this could be a little more specific, e.g. "Thrown for errors t
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/common/TransactionException.java@22
PS11, Line 22:  * e.g. failures to open, commit, or abort a transaction. Or, failing to
> unnecesssary blank line?
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3801
PS11, Line 3801: e of fa
> failures
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3803
PS11, Line 3803:       if (update.isSetTransaction_id()) {
> I think handling the abort on the client is probably the right thing to do 
Yeah it's reasonable, and it's also a better comment than 'error handling is confusing in this method' :)


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1266
PS11, Line 1266:     try {
> is there a jira?
Opened IMPALA-8788. Done.


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1379
PS11, Line 1379:         } catch (TransactionException te) {
> nit: extra newline before catch
Done


http://gerrit.cloudera.org:8080/#/c/13559/11/fe/src/main/java/org/apache/impala/service/Frontend.java@1629
PS11, Line 1629:    * Opens a new transaction.
> The name/comment is a bit misleading, since it depends on whether the query
Yeah I was struggling with the name but couldn't come up with a better one.

Anyway, I identified a bug when we inserted data into a non-ACID table from an ACID table (we wrote the non-ACID table as it was an ACID one). So I changed the code a bit and don't use this function. Added tests for it.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 12
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Wed, 24 Jul 2019 16:04:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3766/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 10
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 27 Jun 2019 14:26:57 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#15).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write the
files at their final destination which is an ACID base or delta directory.
There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
30 files changed, 892 insertions(+), 128 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/15
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 15
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Csaba Ringhofer (Code Review)" <ge...@cloudera.org>.
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 19: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 19
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 21:15:17 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#14).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write the
files at their final destination which is an ACID base or delta directory.
There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
30 files changed, 891 insertions(+), 127 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/14
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 14
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@722
PS6, Line 722:     in_transaction_ = false;
> Can you add query_events_->MarkEvent("Transaction aborted");?
See answer at Frontend.java.


http://gerrit.cloudera.org:8080/#/c/13559/6/be/src/service/client-request-state.cc@820
PS6, Line 820:       // UpdateCatalog() needs the transaction to be committed to load the new valid
> Can you add query_events_->MarkEvent("Transaction committed");?
See answer at Frontend.java.


http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift
File common/thrift/DataSinks.thrift:

http://gerrit.cloudera.org:8080/#/c/13559/6/common/thrift/DataSinks.thrift@81
PS6, Line 81: write_id
> Here and other similar variables: I would consider adding "acid_" or "trans
Actually at first I called it "acid_write_id", but then I noticed that we already use the term "write id" at a lot of places so I just switched to "write_id" for consistency and brevity.
Added the word "ACID" to the comment.


http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@677
PS6, Line 677:    * Parses and plans a query in order to generate its explain string. This method does
> I wonder how to handle EXPLAIN - the backend won't be there to commit/abort
Thanks, I didn't think about EXPLAIN. Done.


http://gerrit.cloudera.org:8080/#/c/13559/6/fe/src/main/java/org/apache/impala/service/Frontend.java@1257
PS6, Line 1257: 
> Can you add timeline.markEvent("Transaction opened");?
Yeah, I think it is a valid security concern. And maybe we shouldn't open a transaction until we are authorized to execute the query. And we should only load the write ids and list of files if we have an open transaction.

Will leave it as it is right now, updated the TODO in the comment.


http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/7/fe/src/main/java/org/apache/impala/service/Frontend.java@1378
PS7, Line 1378:     } catch (Exception e) {
> This may be out of the scope of this patch, but I would consider making abo
Added TODO about it.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 8
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Tue, 25 Jun 2019 14:07:13 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#9).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
25 files changed, 723 insertions(+), 118 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/9
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 9
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 20:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/4672/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 20
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 27 Jul 2019 07:14:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 16:

Fixed the failed tests from https://jenkins.impala.io/job/ubuntu-16.04-from-scratch-cdp-hive/31/testReport/


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 16
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 14:55:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#5).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for queries that refer to
transactional tables. For INSERT statements that write insert-only
ACID tables it also allocates a write ID. The Frontend aborts the
transaction if an error occurs during analysis/planning.

The Backend gets the transaction id in TExecRequestState and the
write id is set for the HDFS table sinks. The sinks write the files
at their final destination which is an ACID base/delta directory.
There is no need for finalization of transactional INSERTS.

ClientRequestState commits the transaction in WaitInternal() if
everything went well. If the transaction is still open in Done(), it
means there was an error, therefore the transaction needs to be aborted.

The Backend commits/aborts the transaction by calling the Frontend via
JNI.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats
* implement TRUNCATE (maybe in another commit)
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
25 files changed, 662 insertions(+), 119 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/5
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 5
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1267
PS4, Line 1267:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1269
PS4, Line 1269:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1322
PS4, Line 1322:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1325
PS4, Line 1325:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1329
PS4, Line 1329:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1334
PS4, Line 1334:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1339
PS4, Line 1339:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1345
PS4, Line 1345:   
line has trailing whitespace


http://gerrit.cloudera.org:8080/#/c/13559/4/fe/src/main/java/org/apache/impala/service/Frontend.java@1367
PS4, Line 1367:       } 
line has trailing whitespace



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 4
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Thu, 13 Jun 2019 14:57:16 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Hello Csaba Ringhofer, Todd Lipcon, Tim Armstrong, Impala Public Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/13559

to look at the new patch set (#17).

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................

IMPALA-8636: Implement INSERT for insert-only ACID tables

This commit adds INSERT support for insert-only ACID tables.

The Frontend opens a transaction for INSERT statements when the target
table is transactional. It also allocates a write ID for the target
table. The Frontend aborts the transaction if an error occurs during
analysis/planning.

The Backend gets the transaction id and the write id in TFinalizeParams.
The write id is also set the for the HDFS table sinks. The sinks write
the files at their final destination which is an ACID base or delta
directory. There is no need for finalization of transactional INSERTS.

When the sinks finished with writing the data, the Coordinator invokes
updateCatalog() on catalogd which also commits the transaction if
everything went well, otherwise the Coordinator aborts the transaction.

Testing:
* added new tables during dataload
* added acid-insert.test file with INSERT statements against the new
  tables
* test insertions between ACID and non-ACID tables
* test error scenarios via debug actions
* added integration test with Hive to test_hms_integration.py. The test
  inserts data with Impala and reads with Hive. (These integration
  tests only run with exhaustive exploration strategy)

TODO in following commits:
* add locks and heartbeats (without heartbeats long-running transactions
  might be aborted by HMS)
* implement TRUNCATE
* CTAS creates files in the 'root' directory of the table/partition. It
  is handled correctly during SELECT, but would be better to create a
  base directory from the beginning. Hive creates a delta directory
  for CTAS.

Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
---
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M be/src/runtime/coordinator.cc
M be/src/runtime/coordinator.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/util/jni-util.h
M common/thrift/CatalogService.thrift
M common/thrift/DataSinks.thrift
M common/thrift/Frontend.thrift
M common/thrift/ImpalaInternalService.thrift
M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/DropTableOrViewStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/LoadDataStmt.java
M fe/src/main/java/org/apache/impala/analysis/TruncateStmt.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
A fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzerTest.java
M testdata/datasets/functional/functional_schema_template.sql
M testdata/datasets/functional/schema_constraints.csv
A testdata/workloads/functional-query/queries/QueryTest/acid-insert-fail.test
A testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
M testdata/workloads/functional-query/queries/QueryTest/acid-negative.test
A testdata/workloads/functional-query/queries/QueryTest/acid-nonacid-insert.test
M tests/metadata/test_hms_integration.py
M tests/query_test/test_insert.py
36 files changed, 956 insertions(+), 137 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/13559/17
-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 17
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/3692/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Jun 2019 15:23:04 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Zoltan Borok-Nagy (Code Review)" <ge...@cloudera.org>.
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/13559/5//COMMIT_MSG@36
PS5, Line 36: TODO in following commits:
> It is not clear to me whether dynamic partitioning is supported or not.
Yeah, it is supported. Added tests about it.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc
File be/src/exec/hdfs-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@214
PS5, Line 214:   // Create final_hdfs_file_name_prefix and tmp_hdfs_file_name_prefix.
> Can you mention the transactional naming logic in the comment?
Added comment.


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/exec/hdfs-table-sink.cc@696
PS5, Line 696:   closed_ = true;
             : }
             : 
> nit: can you improve readability? e.g if (IsTransactional()) return true; c
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/13559/5/be/src/service/client-request-state.cc@817
PS5, Line 817:     RETURN_IF_ERROR(GetCoordinator()->Wait());
             :     if (exec_request().__isset.transaction_id) {
             :       RETURN_IF_ERROR(frontend_->CommitTransaction(exec_request().transaction_id));
             :       in_transaction_ = false;
             :     }
> I am not sure here, but the order of UpdateCatalog() and CommitTransaction(
UpdateCatalog() won't load the new write ids and won't see the new files until they are not committed.

The problematic scenario is when we commit the transaction successfully but UpdateCatalog() fails to update HMS.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1259
PS5, Line 1259:     try {
> I think it would be more readable if the logic inside the try block would b
This 'doCreateExecRequest()' method also exist for the same reason. I'm not sure about the usefulness of adding another method.


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1370
PS5, Line 1370: Exception e) {
> Should this only catch ImpalaException? e.g. if there is a null pointer exc
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1372
PS5, Line 1372: try {
> This can also throw an exception, for example if HMS was shut down, and thi
For the general case maybe a composite exception type is the solution.

Here I think it's best to handle (log) the transaction exception and throw the original one coming from analysis/planning.


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/datasets/functional/functional_schema_template.sql@2611
PS5, Line 2611: ====
> nit: extra ;
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test
File testdata/workloads/functional-query/queries/QueryTest/acid-insert.test:

http://gerrit.cloudera.org:8080/#/c/13559/5/testdata/workloads/functional-query/queries/QueryTest/acid-insert.test@7
PS5, Line 7: ---- QUERY
> It would be great to also read the tables back with HMS.
I have some interop tests at test_hms_integration.py


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py
File tests/metadata/test_hms_integration.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/metadata/test_hms_integration.py@662
PS5, Line 662: acid
> As this test runs in a unique_database, I don't think that it needs to run 
Thanks for the tip!


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py
File tests/query_test/test_insert.py:

http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@142
PS5, Line 142:   def test_acid_insert(self, vector):
> What is the reason for not using a unique database?
The tables being used are tied to these tests only. They don't even contain any data initially. I followed the pattern of test_insert().


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@146
PS5, Line 146:     # We need to turn off capab
> @SkipIfHive2.acid already ensures that HIVE_MAJOR_VERSION >= 3.
Done


http://gerrit.cloudera.org:8080/#/c/13559/5/tests/query_test/test_insert.py@147
PS5, Line 147:     # this python client doesn't have the capability for handling ACID tables. But we only
             :     # need to drop and create such tables, and table properties are preserved during
             :     # those operations and this is enough for the tests (A table is ACID if it has the
             :     # relevant table properties).
             :     capability_check = self.hive_client.getMetaConf("metastore.client.capability
> I do not understand why this is needed.
Updated the comment.



-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 6
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Thu, 20 Jun 2019 14:45:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 16:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/4018/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 16
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 15:36:09 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 20: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 20
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Sat, 27 Jul 2019 07:14:30 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: WIP: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 2:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/3537/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 2
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Fri, 07 Jun 2019 17:16:56 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8636: Implement INSERT for insert-only ACID tables

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/13559 )

Change subject: IMPALA-8636: Implement INSERT for insert-only ACID tables
......................................................................


Patch Set 17:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/4020/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/13559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id6c36fa6902676f06b4e38730f737becfc7c06ad
Gerrit-Change-Number: 13559
Gerrit-PatchSet: 17
Gerrit-Owner: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <cs...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <ta...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Zoltan Borok-Nagy <bo...@cloudera.com>
Gerrit-Comment-Date: Fri, 26 Jul 2019 16:37:02 +0000
Gerrit-HasComments: No