You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@impala.apache.org by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org> on 2021/06/07 18:56:47 UTC

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Wenzhe Zhou has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17553


Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

WIP IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Testing:
 - TODO add e-to-e tests.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
18 files changed, 337 insertions(+), 30 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 5:

(9 comments)

Looks good, just adding a few more nits

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.h@107
PS5, Line 107:   /// Returns TRUE if it's in Kudu transaction.
nit: add: valid only after Open() succeeds


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.h@123
PS5, Line 123: std::string kudu_txn_token_
> nit: is it possible to add 'const' specifier and initialize the token in th
Deserialize can return an error status which we cant propagate during initialization, therefore anything like this is usually deferred to the Open() call


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.cc@1323
PS5, Line 1323:         if (InTransaction()) {
              :           AbortTransaction();
              :         } else if (InKuduTransaction()) {
              :           AbortKuduTransaction();
              :         }
              :         LOG(ERROR) << "ERROR Finalizing DML: " << status.GetDetail();
              :         return status;
              :       }
              :       if (InTransaction()) {
              :         // UpdateCatalog() succeeded and already committed the transaction for us.
              :         int64_t txn_id = GetTransactionId();
              :         if (!frontend_->UnregisterTransaction(txn_id).ok()) {
              :           LOG(ERROR) << Substitute("Failed to unregister transaction $0", txn_id);
              :         }
              :         ClearTransactionState();
              :         query_events_->MarkEvent("Transaction committed");
              :       } else if (InKuduTransaction()) {
              :         CommitKuduTransaction();
              :       }
wondering if we can consolidate the methods for Transactions and let them take care of doing the right thing for hive and kudu. will make it easier to read.

eg L1331-L1341 can be taken care of by FinalizeTransaction
L1323-L1327 can be taken care of by a single AbortTransaction()


http://gerrit.cloudera.org:8080/#/c/17553/5/common/thrift/ImpalaService.thrift
File common/thrift/ImpalaService.thrift:

http://gerrit.cloudera.org:8080/#/c/17553/5/common/thrift/ImpalaService.thrift@673
PS5, Line 673: KUDU_TRANSACTION
nit: how about ENABLE_KUDU_TRANSACTION


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@1124
PS5, Line 1124: setTransactionToken
nit: maybe call it setKuduTransactionToken


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/planner/TableSink.java
File fe/src/main/java/org/apache/impala/planner/TableSink.java:

http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/planner/TableSink.java@110
PS5, Line 110:    * Same as above, plus it takes an ACID write id in parameter 'writeId'.
nit: update comment to mention txnToken


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/planner/TableSink.java@116
PS5, Line 116: txnToken
nit: maybe name it kuduTxnToken to explicitly specify that this is kudu specific


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java@1732
PS5, Line 1732: TransactionException
update comment for TransactionException to include kudu cases too.


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2242
PS5, Line 2242: txn.close();
nit: is close() something that can be put in a finally block. Like should it always be called even it rollback or commit fails? I understand that it is Auto-closeable but if we are calling it explicitly here and not using the try-with-resource code construct then might be good to just put it in the finally block.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 09 Jun 2021 22:00:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 8:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@19
PS8, Line 19: "insert" statements
I'm curious: would it make sense to support similar semantics for CREATE TABLE AS SELECT for Kudu tables or it was rather a deliberate decision not to do so?

Certainly, it's not possible to make the DDL phase of CREATE TABLE AS SELECT transactional as well, but may have some value to end user as well?


http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@22
PS8, Line 22: binary
nit: transaction token ?


http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@24
PS8, Line 24: binary
nit: transaction token ?


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@336
PS8, Line 336: class TestKuduTransaction(CustomClusterTestSuite):
> Another test case that might be worth adding is running two separate Impala
If it makes sense, one more test scenario to add might be to ensure the transaction handle kept by the front-end in KuduTransactionManager keeps the transaction open by heartbeating.  That probably requires adding an extra FIS and setting --txn_keepalive_interval_ms to something short (like 1000 instead of default 30000) to keep the runtime of the test short.


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@346
PS8, Line 346:   def test_kudu_txn_commit(self, cursor, unique_database):
Since only INSERT/INSERT_IGNORE is supported for Kudu transactions, does it make sense to add scenarios to verify that even if ENABLE_KUDU_TRANSACTION is set, Impala isn't going to initiate a Kudu transaction in such cases?


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@366
PS8, Line 366:     query_options = {'debug_action': 'FIS_FAIL_KUDU_TABLE_SINK_BATCH:FAIL@1.0'}
> A more organic test might be to try inserting duplicate rows. That would ex
+1



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jun 2021 00:02:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 18:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8930/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 00:18:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 19:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java@1722
PS19, Line 1722: don't
nit: "doesn't" or "does not", same below.


http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java@1818
PS19, Line 1818:         if (queryOptions.isEnable_kudu_transaction()) {
               :           // Kudu don't support transaction for DELETE/UPDATE statements now.
               :           if (analysisResult.isUpdateStmt()
               :               && analysisResult.getUpdateStmt().isTargetTableKuduTable()) {
               :             throw new TransactionException(
               :                 "Kudu don't support transaction for UPDATE statement.");
               :           } else if (analysisResult.isDeleteStmt()
               :               && analysisResult.getDeleteStmt().isTargetTableKuduTable()) {
               :             throw new TransactionException(
               :                 "Kudu don't support transaction for DELETE statement.");
               :           }
               :         }
Just curious -- today Kudu will return an error if trying to update or delete in the context of a transaction. Would it make sense to rely on that, and simply pass all operations through to Kudu? If any of them fail, they would be caught as row errors by Impala and fail the transaction anyway. That way, we wouldn't have to update this code much as transactions support expands.


http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     self.execute_query(self._delete_query.format(table_name))
              :     self.execute_query(self._insert_3_rows_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(3,)]
              :     self.execute_query(self._insert_select_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              : 
              :   @pytest.mark.execute_serially
              :   @SkipIfKudu.no_hybrid_clock
              :   def test_kudu_txn_not_implemented(self, cursor, unique_data
> Changed the frontend code to return an error if UPDATE/UPSERT/DELETE is ran
Thank you for updating this!



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 19
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 18:43:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 1:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8853/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Comment-Date: Mon, 07 Jun 2021 19:18:11 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     self.execute_query(self._delete_query.format(table_name))
              :     self.execute_query(self._insert_3_rows_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(3,)]
              :     self.execute_query(self._insert_select_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              : 
              :   @pytest.mark.execute_serially
              :   @SkipIfKudu.no_hybrid_clock
              :   def test_kudu_txn_not_implemented(self, cursor, unique_data
> I agree that having tests for these makes sense -- what I am suggesting is 
Query option ENABLE_KUDU_TRANSACTION apply to all the queries in a session. If we return error for update/upsert/delete when ENABLE_KUDU_TRANSACTION is set as true, then user have to set ENABLE_KUDU_TRANSACTION as true before insert and set it as false after insert. 
Maybe we should rename the option as ENABLE_KUDU_TRANSACTION_FOR_INSERT to avoid confusion? But later we have to rename when we support transaction for update.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 01:49:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#16). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 713 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/16
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 16
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 3:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8856/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 01:47:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

WIP IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - TODO add e-to-e tests.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
24 files changed, 371 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 17:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py@386
PS17, Line 386:     self.execute_query(self._delete_query.format(table_name))
> Why don't we need to move this below L390? Wouldn't the delete queries fail
Remove this line after set ENABLE_KUDU_TRANSACTION=false


http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py@417
PS17, Line 417:       self.execute_query(self._update_query.format(table_name))
              :       self.execute_query(self._upsert_query.format(table_name))
              :       self.execute_query(self._delete_query.format(table_name))
> Can we have three separate try/catches? Otherwise we'll just fail at the fi
It does not help with separate try/catch. The test throw assertion when getting first error.


http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py@422
PS17, Line 422:       assert "Not implemented" in str(e)
> nit: consider re-checking the number of rows after failing the above operat
previous line already throw assertion, remove this line.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 23:54:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 10:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@19
PS8, Line 19: "insert" statements
> We should support transaction for CTAS. Could we work it in following up Ji
Sorry, CTAS is already supported. Will add test case for CTAS and update commit message.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jun 2021 21:26:33 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@19
PS8, Line 19: "insert" statements
> I'm curious: would it make sense to support similar semantics for CREATE TA
We should support transaction for CTAS. Could we work it in following up Jira?


http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@22
PS8, Line 22: binary
> nit: transaction token ?
Done


http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@24
PS8, Line 24: binary
> nit: transaction token ?
Done


http://gerrit.cloudera.org:8080/#/c/17553/8/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17553/8/be/src/exec/kudu-scan-node-base.h@102
PS8, Line 102: kudu::client::KuduClient* kudu_client() const
> Well, now it's actually not const-correct [1].  I'd rather keep it non-cons
I changed it back as non const. Thanks for the reference.


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2237
PS5, Line 2237:  void
> nit: I'm curious why this is logged with 'error' priority when similar log 
changed to info


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2239
PS5, Line 2239: Transaction
> What if txn is null?  Can it happen only due to a programming error or it c
Added precondition for txn.


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@336
PS8, Line 336: class TestKuduTransaction(CustomClusterTestSuite):
> Another test case that might be worth adding is running two separate Impala
Good suggestion. Added the test case.


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@336
PS8, Line 336: class TestKuduTransaction(CustomClusterTestSuite):
> If it makes sense, one more test scenario to add might be to ensure the tra
Is --txn_keepalive_interval_ms a Kudu starting configuration? It seems that Impala end-to-end test do not restart Kudu cluster. Could we set it with Kudu API in runtime?


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@346
PS8, Line 346:   def test_kudu_txn_commit(self, cursor, unique_database):
> Since only INSERT/INSERT_IGNORE is supported for Kudu transactions, does it
Sorry, I did not get it how to add such scenario. Could you give more detail? I tried to run query "INSERT IGNORE" from impala-shell as below:
INSERT IGNORE INTO my-test VALUES (99, "John"); 
But got syntax error for IGNORE.


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@366
PS8, Line 366:     query_options = {'debug_action': 'FIS_FAIL_KUDU_TABLE_SINK_BATCH:FAIL@1.0'}
> +1
Done


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@366
PS8, Line 366:     query_options = {'debug_action': 'FIS_FAIL_KUDU_TABLE_SINK_BATCH:FAIL@1.0'}
> A more organic test might be to try inserting duplicate rows. That would ex
Added a test cases with duplicated key values.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jun 2021 17:38:34 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 20: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17553/20/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/17553/20/common/thrift/Query.thrift@684
PS20, Line 684: // Stores the transaction id if the query is transactional.
nit: update to mention that this is only used for HIVE ACID


http://gerrit.cloudera.org:8080/#/c/17553/20/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/20/tests/custom_cluster/test_kudu.py@540
PS20, Line 540: patch
nit: batch



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 00:38:40 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 14:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8912/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 00:27:05 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 16:

Build Failed 

https://jenkins.impala.io/job/gerrit-code-review-checks/8925/ : Initial code review checks failed. See linked job for details on the failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 16
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 19:31:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#19). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator. Impala return an error if UPDATE/UPSERT/DELETE are
   performed within a transaction context.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
28 files changed, 751 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/19
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 19
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 13:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8901/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 13
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Jun 2021 04:26:07 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 6:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8884/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 16:26:22 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     self.execute_query(self._delete_query.format(table_name))
              :     self.execute_query(self._insert_3_rows_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(3,)]
              :     self.execute_query(self._insert_select_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              : 
              :   @pytest.mark.execute_serially
              :   @SkipIfKudu.no_hybrid_clock
              :   def test_kudu_txn_not_implemented(self, cursor, unique_data
> Query option ENABLE_KUDU_TRANSACTION apply to all the queries in a session.
I understand the concern around having to toggle ENABLE_KUDU_TRANSACTION. I still think that's less surprising behavior than transparently doing performing queries non-transactionally.

That said, I can get behind renaming this to ENABLE_KUDU_TRANSACTION_FOR_INSERTS. Is it acceptable to rename query option names like that? I imagine so if this is tagged as a development option. If not, I suppose we can always re-introduce the fuller ENABLE_KUDU_TRANSACTIONS once broader support lands.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 01:57:10 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#18). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 712 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/18
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 22: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 20:34:44 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 17:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     # Disable Kudu transactions and insert rows to the Kudu table. Should get same
              :     # results as transaction enabled.
              :     self.execute_query("set ENABLE_KUDU_TRANSACTION=false")
              :     self.execute_query(self._insert_3_rows_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(3,)]
              :     self.execute_query(self._insert_select_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              : 
              :   @pytest.mark.execute_serially
> I will move these tests to a separate test case.
Thank you for updating this!


http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py@386
PS17, Line 386:     self.execute_query(self._delete_query.format(table_name))
Why don't we need to move this below L390? Wouldn't the delete queries fail as a part of a transaction?


http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py@417
PS17, Line 417:       self.execute_query(self._update_query.format(table_name))
              :       self.execute_query(self._upsert_query.format(table_name))
              :       self.execute_query(self._delete_query.format(table_name))
Can we have three separate try/catches? Otherwise we'll just fail at the first query and skip testing the other two, right?


http://gerrit.cloudera.org:8080/#/c/17553/17/tests/custom_cluster/test_kudu.py@422
PS17, Line 422:       assert "Not implemented" in str(e)
nit: consider re-checking the number of rows after failing the above operations?



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 21:30:21 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#17). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 714 insertions(+), 64 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/17
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 20:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8946/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Jun 2021 01:47:28 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 15:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8913/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 15
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 01:21:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#13). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 676 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/13
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 13
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     self.execute_query(self._delete_query.format(table_name))
              :     self.execute_query(self._insert_3_rows_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(3,)]
              :     self.execute_query(self._insert_select_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              : 
              :   @pytest.mark.execute_serially
              :   @SkipIfKudu.no_hybrid_clock
              :   def test_kudu_txn_not_implemented(self, cursor, unique_data
> Ah seems I wasn't clear about my concern here. It seems like surprising beh
Tests for update/upsert/delete were suggested by Alexey. Tests verified Impala do not wrongly open Kudu transaction for update/upsert/delete when ENABLE_KUDU_TRANSACTION is set as true. The tests found a bug in Impala frontend code which opened transaction for upsert so it's helpful to detect error in Impala code.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 01:28:03 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 10:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8898/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jun 2021 18:10:18 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@346
PS8, Line 346:   _create_parquet_table_query = "create table {0} (a int, b string) stored as parquet"
> Thank you for the clarification and making it work for CTAS queries as well
Tried UPDATE and UPSERT. UPDATE is working, but UPSERT failed since the transaction is opened and hit "Not implemented" error. Will fix it.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 13
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Jun 2021 23:33:28 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 9:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8897/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jun 2021 18:01:50 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-scan-node-base.h@102
PS5, Line 102:   kudu::client::KuduClient* kudu_client() { return client_.get(); }
> Sorry I did not make it clear in the comment. I meant to make the member fu
Got it. Fixed it.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 20:54:18 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 20: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java@1818
PS19, Line 1818:       }
               :       return result;
               :     } catch (Exception e) {
               :       if (queryCtx.isSetTransaction_id()) {
               :         try {
               :           abortTransaction(queryCtx.getTransaction_id());
               :           timeline.markEvent("Transaction aborted");
               :         } catch (TransactionException te) {
               :           LOG.error("Could not abort transaction because: " + te.getMessage());
               :         }
               :       } else if (queryCtx.isIs_kudu_transactional()) {
               :         t
> Changed code to open transaction for delete/update/upsert. Also updated com
Thank you!



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Jun 2021 19:13:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/exec/kudu-table-sink.cc@330
PS13, Line 330: Retrun
> nit: Return
Done


http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/service/client-request-state.cc@955
PS13, Line 955: !status.ok()
> nit: maybe wrap in UNLIKELY?
Done


http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/service/client-request-state.cc@1350
PS13, Line 1350: AbortTransaction()
Should be AbortKuduTransaction. Fixed it.


http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     # Verify that multi-row transaction is not opened for "update".
              :     self.execute_query(self._update_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              :     # Verify that multi-row transaction is not opened for "upsert".
              :     self.execute_query(self._upsert_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              :     # Delete all rows from Kudu table. Also verify that multi-row transaction is not
              :     # opened for "delete".
              :     self.execute_query(self._delete_query.format(table_name))
> I'm hesitant to allow this case here -- it seems odd to proceed with the op
I will move these tests to a separate test case.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 00:52:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added e-to-e tests.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 450 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 6
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

WIP IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - TODO add e-to-e tests.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
23 files changed, 354 insertions(+), 48 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 7: Code-Review+1

(3 comments)

Looks good and thanks for the rework!

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-scan-node-base.h@102
PS5, Line 102:   kudu::client::KuduClient* kudu_client() { return client_.get(); }
> This function is called when calling Kudu API KuduScanToken::DeserializeInt
Sorry I did not make it clear in the comment. I meant to make the member function const. 

kudu::client::KuduClient* kudu_client() const { return client_.get(); }


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.h@768
PS5, Line 768:   /// Aborts the Kudu transaction of this client request.
> This function write error log if it's failed. Caller cannot take additional
Done


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.h@772
PS5, Line 772:   void CommitKuduTransaction();
> This function write error log if it's failed. Caller cannot take additional
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 20:03:43 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#11). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 586 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 11
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#14). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 700 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/14
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 4:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8859/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 06:04:08 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 3:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/client-request-state.h@756
PS2, Line 756: pen transaction
> nit: is this specific to Hive? If so, maybe mention that in these comments?
Done


http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/query-options.h@252
PS2, Line 252: ADVANCED
> Given this feature is considered experimental in Kudu, would it make sense 
Done


http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@206
PS2, Line 206: t
> nit: maybe prefix this with "kudu" so it's clear the transactionality is sp
Done


http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1827
PS2, Line 1827: timeline.markEvent("Kudu transaction aborted");
> nit: does this timeline have a query ID associated with it already? If not,
Done


http://gerrit.cloudera.org:8080/#/c/17553/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/3/fe/src/main/java/org/apache/impala/service/Frontend.java@1717
PS3, Line 1717: else if ((targetTable instanceof FeKuduTable)
              :             && queryOptions.isKudu_transaction()) {
              :           FeKuduTable kuduTable = (FeKuduTable) targetTable;
              :           KuduClient client = KuduUtil.getKuduClient(kuduTable.getKuduMasterHosts());
              :           KuduTransaction txn = null;
              :           try {
              :             // Open Kudu transaction.
              :             txn = client.newTransaction();
              :             timeline.markEvent("Kudu transaction opened with query id: "
              :                 + queryCtx.getQuery_id().toString());
              :             insertStmt.setTransactionToken(txn.serialize());
              :             kuduTxnManager_.addTransaction(queryCtx.getQuery_id(), txn);
              :             queryCtx.setIs_kudu_transactional(true);
              :           } catch (IOException e) {
              :             if (txn != null) txn.close();
              :             throw new TransactionException(e.getMessage());
              :           }
              :         }
> On Slack we discussed some considerations with regards to which daemon is i
Added comments in KuduTransactionManager


http://gerrit.cloudera.org:8080/#/c/17553/2/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
File testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl:

http://gerrit.cloudera.org:8080/#/c/17553/2/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl@18
PS2, Line 18: -time_source=system_unsync
> In your tests, we'll need to add --enable_txn_system_client_init here as we
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 05:40:36 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 20:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17553/20/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/17553/20/common/thrift/Query.thrift@684
PS20, Line 684: // Stores the transaction id if the query is transactional.
> nit: update to mention that this is only used for HIVE ACID
fixed.


http://gerrit.cloudera.org:8080/#/c/17553/20/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/20/tests/custom_cluster/test_kudu.py@540
PS20, Line 540: patch
> nit: batch
fixed.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 00:52:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 443 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 7:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8885/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 17:17:00 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 14:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/exec/kudu-table-sink.cc@330
PS13, Line 330: Retrun
nit: Return


http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/13/be/src/service/client-request-state.cc@955
PS13, Line 955: !status.ok()
nit: maybe wrap in UNLIKELY?


http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     # Verify that multi-row transaction is not opened for "update".
              :     self.execute_query(self._update_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              :     # Verify that multi-row transaction is not opened for "upsert".
              :     self.execute_query(self._upsert_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              :     # Delete all rows from Kudu table. Also verify that multi-row transaction is not
              :     # opened for "delete".
              :     self.execute_query(self._delete_query.format(table_name))
I'm hesitant to allow this case here -- it seems odd to proceed with the operations if they cannot be performed transactionally. While potentially inconvenient, having to switch between true and false for ENABLE_KUDU_TRANSACTION, it seems less surprising to have a query fail and for the entire transaction to be aborted if we're performing any update/delete/upserts operations that fail when ENABLE_KUDU_TRANSACTION=true. What do you think?



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 14
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 00:09:51 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     self.execute_query(self._delete_query.format(table_name))
              :     self.execute_query(self._insert_3_rows_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(3,)]
              :     self.execute_query(self._insert_select_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              : 
              :   @pytest.mark.execute_serially
              :   @SkipIfKudu.no_hybrid_clock
              :   def test_kudu_txn_not_implemented(self, cursor, unique_data
> Tests for update/upsert/delete were suggested by Alexey. Tests verified Imp
I agree that having tests for these makes sense -- what I am suggesting is that the correct behavior may be to return an error if the user tries to update/delete/upsert when ENABLE_KUDU_TRANSACTION is set to true, rather than transparently ignoring it. As a user, if I've specified wanting to perform my queries transactionally, I might want to know when such a transaction is not possible, rather than proceeding and having the "transaction" be performed non-transactionally.

Specifically, if the user has set ENABLE_KUDU_TRANSACTION=true, should we fail the query immediately if any of the operations being performed are non-inserts?



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 01:35:52 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#12). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 616 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/12
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 12
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#9). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 537 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/9/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/9/tests/custom_cluster/test_kudu.py@347
PS9, Line 347: \
flake8: E501 line too long (91 > 90 characters)



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 9
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Fri, 11 Jun 2021 17:39:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17553/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/1/be/src/service/client-request-state.cc@825
PS1, Line 825:   if (InTransaction()) {
             :     AbortTransaction();
             :   } else if (InKuduTransaction()) {
             :     AbortKuduTransaction();
             :   }
> I wonder the rational to decouple kudu tractions from impala ones. For exam
Currently Impala support transaction for insert-only ACID tables for insert statements. We cannot insert rows to insert-only ACID table and Kudu table in same query. We did not write a common function since their implementations are different. Kudu does not provide transaction ID.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
File fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java@54
PS1, Line 54: put(queryId, txn);
> Normally, a transaction can cover multiple queries as follows.
No, it does not support.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1717
PS1, Line 1717:  else if ((targetTable instanceof FeKuduTable)
              :             && queryOptions.isKudu_transaction()) {
              :           FeKuduTable kuduTable = (FeKuduTable) targetTable;
              :           KuduClient client = KuduUtil.getKuduClient(kuduTable.getKuduMasterHosts());
              :           KuduTransaction txn = null;
              :           try {
              :             // Open Kudu transaction.
              :             txn = client.newTransaction();
              :             timeline.markEvent("Kudu transaction opened with query id: "
              :                 + queryCtx.getQuery_id().toString());
              :             insertStmt.setTransactionToken(txn.serialize());
              :             kuduTxnManager_.addTransaction(queryCtx.getQuery_id(), txn);
              :             queryCtx.setIs_kudu_transactional(true);
              :           } catch (IOException e) {
              :             if (txn != null) txn.close();
              :             throw new TransactionException(e.getMessage());
              :           }
              :         }
> This covers insert/createTableAsSelect stmts which is very nice. 
Currently Kudu only support transaction for insert. They are working on update, but not finish yet.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java@772
PS1, Line 772: TUniqueId queryId
> Would it be possible to pass in a kuduTransaction object direct here?  This
This function is called from C++ code so we cannot pass KuduTransaction object directly.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java@782
PS1, Line 782: TUniqueId queryId
> Same comment as above.
This function is called from C++ code so we cannot pass KuduTransaction object directly.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 07 Jun 2021 23:47:11 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.h
File be/src/exec/kudu-table-sink.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.h@123
PS5, Line 123: std::string kudu_txn_token_
nit: is it possible to add 'const' specifier and initialize the token in the initializer's list during construction?


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.cc
File be/src/exec/kudu-table-sink.cc:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.cc@90
PS5, Line 90:  else {
            :     kudu_txn_token_ = "";
            :   }
nit: I guess this part isn't strictly necessary given that std::string initializes as an empty string by its default constructor.


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-table-sink.cc@380
PS5, Line 380: if (txn_.get() != nullptr)
nit: this could be omitted.  At least, this code resets 'session_' and 'client_' regardless of their current state.


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.cc@1333
PS5, Line 1333:         int64_t txn_id = GetTransactionId();
              :         if (!frontend_->UnregisterTransaction(txn_id).ok()) {
              :           LOG(ERROR) << Substitute("Failed to unregister transaction $0", txn_id);
              :         }
              :         ClearTransactionState();
nit: I'm curious why this is performed only for Hive, but not for a Kudu transaction.  Does it make sense to add a comment to clarify on this?


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@1125
PS5, Line 1125: if (kuduTxnToken != null)
nit: what if txn token is null?  Is that a programming error or that's one of the intended use cases here?  If the former, maybe add a precondition check here?



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 23:59:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/14/tests/custom_cluster/test_kudu.py@388
PS14, Line 388:     self.execute_query(self._delete_query.format(table_name))
              :     self.execute_query(self._insert_3_rows_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(3,)]
              :     self.execute_query(self._insert_select_query.format(table_name))
              :     cursor.execute(self._row_num_query.format(table_name))
              :     assert cursor.fetchall() == [(103,)]
              : 
              :   @pytest.mark.execute_serially
              :   @SkipIfKudu.no_hybrid_clock
              :   def test_kudu_txn_not_implemented(self, cursor, unique_data
> Thank you for updating this!
Ah seems I wasn't clear about my concern here. It seems like surprising behavior that a user specifies ENABLE_KUDU_TRANSACTION=true, but under the hood, we don't perform a transaction for the update/delete/upsert operations. As a user, I'd expect that if my query could not be performed transactionally, the query should fail -- is that not the expectation of Impala users?



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 18
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 00:44:29 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 22:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7265/ DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 14:33:15 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 5:

(11 comments)

Looks good!

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/exec/kudu-scan-node-base.h@102
PS5, Line 102:   kudu::client::KuduClient* kudu_client() { return client_.get(); }
nit. should be const?


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/runtime/exec-env.h
File be/src/runtime/exec-env.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/runtime/exec-env.h@267
PS5, Line 267: struct KuduClientPtr {
             :     kudu::client::sp::shared_ptr<kudu::client::KuduClient> kudu_client;
             :   };
Can we get rid off struct and use the member directly here?


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.h@768
PS5, Line 768:   /// Aborts the Kudu transaction of this client request.
nit May explain what can happen when the action is successful or not.

It may be a good idea to return bool to indicate the status.


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.h@772
PS5, Line 772:   void CommitKuduTransaction();
Same as above.


http://gerrit.cloudera.org:8080/#/c/17553/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/1/be/src/service/client-request-state.cc@825
PS1, Line 825:   if (InTransaction()) {
             :     AbortTransaction();
             :   } else if (InKuduTransaction()) {
             :     AbortKuduTransaction();
             :   }
> Currently Impala support transaction for insert-only ACID tables for insert
Thanks for the clarification.   Maybe we add a comment here to explain the rational that these two types of transaction do not overlap.


http://gerrit.cloudera.org:8080/#/c/17553/1/be/src/service/client-request-state.cc@1624
PS1, Line 1624: transaction_closed_
If this data member is for both the impala and kudu transaction, how do we make sure only one type of transaction can be in progress at any point of time?


http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/5/be/src/service/client-request-state.cc@1323
PS5, Line 1323:    if (InTransaction()) {
              :           AbortTransaction();
              :         } else if (InKuduTransaction()) {
              :           AbortKuduTransaction();
              :         }
nit. This code block is identical to the one at line 825-828 above. Maybe refactor.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
File fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java@54
PS1, Line 54: checkNotNull(txn);
> Right, IIUC Impala doesn't have this capability right now, but we may consi
Okay. Thanks for the explanation.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1717
PS1, Line 1717:  else if ((targetTable instanceof FeKuduTable)
              :             && queryOptions.isKudu_transaction()) {
              :           FeKuduTable kuduTable = (FeKuduTable) targetTable;
              :           KuduClient client = KuduUtil.getKuduClient(kuduTable.getKuduMasterHosts());
              :           KuduTransaction txn = null;
              :           try {
              :             // Open Kudu transaction.
              :             txn = client.newTransaction();
              :             timeline.markEvent("Kudu transaction opened with query id: "
              :                 + queryCtx.getQuery_id().toString());
              :             insertStmt.setTransactionToken(txn.serialize());
              :             kuduTxnManager_.addTransaction(queryCtx.getQuery_id(), txn);
              :             queryCtx.setIs_kudu_transactional(true);
              :           } catch (IOException e) {
              :             if (txn != null) txn.close();
              :             throw new TransactionException(e.getMessage());
              :           }
              :         }
> Currently Kudu only support transaction for insert. They are working on upd
Done


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java@772
PS1, Line 772: byte[] thriftQuer
> This function is called from C++ code so we cannot pass KuduTransaction obj
Done


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java@782
PS1, Line 782: ery.
> This function is called from C++ code so we cannot pass KuduTransaction obj
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 19:13:22 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert", "CTAS" and "UPDATE/UPSERT/DELETE" statements
   by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
   Since Kudu does not support transaction for "UPDATE/UPSERT/DELETE"
   statements now, Kudu returns error which causes transaction to be
   aborted.
 - Coordinator commits the transaction if everything goes well.
   Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Reviewed-on: http://gerrit.cloudera.org:8080/17553
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
29 files changed, 794 insertions(+), 67 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 23
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 19:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java@1722
PS19, Line 1722: don't
> nit: "doesn't" or "does not", same below.
The code were removed.


http://gerrit.cloudera.org:8080/#/c/17553/19/fe/src/main/java/org/apache/impala/service/Frontend.java@1818
PS19, Line 1818:         if (queryOptions.isEnable_kudu_transaction()) {
               :           // Kudu don't support transaction for DELETE/UPDATE statements now.
               :           if (analysisResult.isUpdateStmt()
               :               && analysisResult.getUpdateStmt().isTargetTableKuduTable()) {
               :             throw new TransactionException(
               :                 "Kudu don't support transaction for UPDATE statement.");
               :           } else if (analysisResult.isDeleteStmt()
               :               && analysisResult.getDeleteStmt().isTargetTableKuduTable()) {
               :             throw new TransactionException(
               :                 "Kudu don't support transaction for DELETE statement.");
               :           }
               :         }
> Just curious -- today Kudu will return an error if trying to update or dele
Changed code to open transaction for delete/update/upsert. Also updated commit message and test cases.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 19
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 17 Jun 2021 01:17:45 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 19:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8939/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 19
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Wed, 16 Jun 2021 16:05:10 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 17:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8927/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 17
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 21:00:37 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17553/8/be/src/exec/kudu-scan-node-base.h
File be/src/exec/kudu-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/17553/8/be/src/exec/kudu-scan-node-base.h@102
PS8, Line 102: kudu::client::KuduClient* kudu_client() const
Well, now it's actually not const-correct [1].  I'd rather keep it non-const, but it's up to you.

If you want it to be const-correct, it should have the following signature:

  const kudu::client::KuduClient* kudu_client() const;

[1] https://chromium.googlesource.com/chromium/src/+/master/styleguide/c++/const.md


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2237
PS5, Line 2237:  void
nit: I'm curious why this is logged with 'error' priority when similar log in commitKuduTransaction() is of 'info' priority? Is that because rolling back a transaction happens only due to an error, and it's not possible to initiate rolling back a transaction explicitly?


http://gerrit.cloudera.org:8080/#/c/17553/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2239
PS5, Line 2239: Transaction
What if txn is null?  Can it happen only due to a programming error or it can happen in other cases?  If the former, does it make sense to add a precondition?  If the latter, does it deserve to be logged?



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 21:35:57 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 21:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8989/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 01:16:21 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 8:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8892/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 21:33:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added e-to-e tests.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 443 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 7
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 21: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 01:44:31 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

WIP IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer by apllying a
Thomas's old patch since we have to pass KuduClient as shared pointer
for Kudu API KuduTransaction::Deserialize().

Testing:
 - TODO add e-to-e tests.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
23 files changed, 364 insertions(+), 47 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#15). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" and "CTAS" statements by Impala's frontend of
   coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 714 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/15
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 15
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 13: Code-Review+1

(3 comments)

Thank you very much for working on this!  This patch looks good to me.

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17553/8//COMMIT_MSG@19
PS8, Line 19: "insert" and "CTAS"
> Sorry, CTAS is already supported. Will add test case for CTAS and update co
Yep, I was trying to understand whether not supporting multi-row Kudu transactions was something deliberate or the presence of CTAS for Kudu was just overlooked, and I was wandering whether it makes sense to support multi-row transactions in the context of CTAS queries for Impala+Kudu.

Thank you for adding support for that!  I think there is value in that for Impala+Kudu users.


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@336
PS8, Line 336: 
> Added test case for heartbeating.
It's possible to change --txn_keepalive_interval_ms via the `kudu tserver set_flag` CLI tool in addition to setting the flag at startup for kudu-tserver processes.

Probably, the latter is less hustle since if using the `kudu tserver set_flag` approach, it would be necessary to call the following for every tablet server to set the interval to 1 second:

  `kudu tserver set_flag <tserver_rpc_address> txn_keepalive_interval_ms 1000`

I see you already added a proper setup step into TestKuduTxnKeepalive::setup_class() method.  Thank you!


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@346
PS8, Line 346:   _create_parquet_table_query = "create table {0} (a int, b string) stored as parquet"
> Searched the document. INSERT IGNORE is not supported now.
Thank you for the clarification and making it work for CTAS queries as well!  I didn't realize INSERT_IGNORE isn't supported for Kudu+Impala.

My point was to make sure that a multi-row Kudu transaction is opened by Impala only in case of INSERT/INSERT_IGNORE, but not for UPDATE/UPSERT/DELETE even if ENABLE_KUDU_TRANSACTION is set to true.  Kudu would return an error if UPDATE/DELETE is performed within a multi-row transaction context.

I see you already added a case that verifies that for the DELETE, so now the DELETE is now covered.  If it makes sense, consider adding a sub-scenario for UPDATE/UPSERT.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 13
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 14 Jun 2021 21:53:48 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/client-request-state.h
File be/src/service/client-request-state.h:

http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/client-request-state.h@756
PS2, Line 756: pen transaction
nit: is this specific to Hive? If so, maybe mention that in these comments? Same with AbortTransaction, GetTransactionId, ClearTransactionState?


http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/17553/2/be/src/service/query-options.h@252
PS2, Line 252: ADVANCED
Given this feature is considered experimental in Kudu, would it make sense to mark this as DEVELOPMENT? Is there any kind of experimental tag that Impala exposes?


http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java@206
PS2, Line 206: t
nit: maybe prefix this with "kudu" so it's clear the transactionality is specific to Kudu? Same elsewhere? Especially so there's no risk of confusing this with Hive transactions (or the next system's transactions).


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
File fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java@54
PS1, Line 54: put(queryId, txn);
> No, it does not support.
Right, IIUC Impala doesn't have this capability right now, but we may consider extending Kudu's API to include a transaction ID rather than relying on query ID. As far as I can tell, each Impala query does correspond to a single transaction, so this key seems to be ok for now.


http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/2/fe/src/main/java/org/apache/impala/service/Frontend.java@1827
PS2, Line 1827: timeline.markEvent("Kudu transaction aborted");
nit: does this timeline have a query ID associated with it already? If not, maybe add the query ID here too?


http://gerrit.cloudera.org:8080/#/c/17553/3/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/3/fe/src/main/java/org/apache/impala/service/Frontend.java@1717
PS3, Line 1717: else if ((targetTable instanceof FeKuduTable)
              :             && queryOptions.isKudu_transaction()) {
              :           FeKuduTable kuduTable = (FeKuduTable) targetTable;
              :           KuduClient client = KuduUtil.getKuduClient(kuduTable.getKuduMasterHosts());
              :           KuduTransaction txn = null;
              :           try {
              :             // Open Kudu transaction.
              :             txn = client.newTransaction();
              :             timeline.markEvent("Kudu transaction opened with query id: "
              :                 + queryCtx.getQuery_id().toString());
              :             insertStmt.setTransactionToken(txn.serialize());
              :             kuduTxnManager_.addTransaction(queryCtx.getQuery_id(), txn);
              :             queryCtx.setIs_kudu_transactional(true);
              :           } catch (IOException e) {
              :             if (txn != null) txn.close();
              :             throw new TransactionException(e.getMessage());
              :           }
              :         }
On Slack we discussed some considerations with regards to which daemon is in charge of creating vs aborting vs committing the transaction. It'd be nice if either here, or perhaps in the KuduTransactionManager, we could mention those considerations. Especially if it explains why the commit flow doesn't look identical to that of Hive transactions.


http://gerrit.cloudera.org:8080/#/c/17553/2/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
File testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl:

http://gerrit.cloudera.org:8080/#/c/17553/2/testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl@18
PS2, Line 18: -time_source=system_unsync
In your tests, we'll need to add --enable_txn_system_client_init here as well.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 3
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 02:11:55 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 2:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8855/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 2
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 00:16:59 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 20: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Mon, 21 Jun 2021 23:07:46 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#10). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
26 files changed, 537 insertions(+), 63 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 10
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#20). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert", "CTAS" and "UPDATE/UPSERT/DELETE" statements
   by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
   Since Kudu does not support transaction for "UPDATE/UPSERT/DELETE"
   statements now, Kudu returns error which causes transaction to be
   aborted.
 - Coordinator commits the transaction if everything goes well.
   Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
29 files changed, 792 insertions(+), 66 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/20
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 20
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 22: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 22
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 14:33:14 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@336
PS8, Line 336: class TestKuduTransaction(CustomClusterTestSuite):
Another test case that might be worth adding is running two separate Impala queries that are inserting to the same Kudu partitions. One of the queries should fail, given Kudu's current implementation of partition locking.


http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@366
PS8, Line 366:     query_options = {'debug_action': 'FIS_FAIL_KUDU_TABLE_SINK_BATCH:FAIL@1.0'}
A more organic test might be to try inserting duplicate rows. That would exercise Impala's handling of Kudu row errors as well. Same below.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 8
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 10 Jun 2021 21:42:38 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 13:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py
File tests/custom_cluster/test_kudu.py:

http://gerrit.cloudera.org:8080/#/c/17553/8/tests/custom_cluster/test_kudu.py@346
PS8, Line 346:   _create_parquet_table_query = "create table {0} (a int, b string) stored as parquet"
> Tried UPDATE and UPSERT. UPDATE is working, but UPSERT failed since the tra
Fixed the bug for UPSERT.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 13
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 15 Jun 2021 00:02:46 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Impala Public Jenkins (Code Review)" <ge...@cloudera.org>.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 5:

Build Successful 

https://jenkins.impala.io/job/gerrit-code-review-checks/8862/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests.


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 5
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Tue, 08 Jun 2021 17:51:39 +0000
Gerrit-HasComments: No

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

WIP IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert" statements by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a binary and pass to
   executors.
 - Executors deserialize the binary and ingest via that transaction
   handle.
 - Coordinator commits the transaction after updating catalog if
   everything goes well. Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - TODO add e-to-e tests.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
24 files changed, 365 insertions(+), 52 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 4
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] WIP IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Qifan Chen (Code Review)" <ge...@cloudera.org>.
Qifan Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: WIP IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 1:

(5 comments)

Just a few comments. Looks good.

http://gerrit.cloudera.org:8080/#/c/17553/1/be/src/service/client-request-state.cc
File be/src/service/client-request-state.cc:

http://gerrit.cloudera.org:8080/#/c/17553/1/be/src/service/client-request-state.cc@825
PS1, Line 825:   if (InTransaction()) {
             :     AbortTransaction();
             :   } else if (InKuduTransaction()) {
             :     AbortKuduTransaction();
             :   }
I wonder the rational to decouple kudu tractions from impala ones. For example, for the following query, if it is protected via a transaction, you would need to protect both tables in the same transaction. 

insert into kudu_t as select * from impala_t


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
File fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java@54
PS1, Line 54: put(queryId, txn);
Normally, a transaction can cover multiple queries as follows.

begin work;
<q1>
<q2>
...
commit;

I wonder if kudu/impala supports such semantics.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1717
PS1, Line 1717:  else if ((targetTable instanceof FeKuduTable)
              :             && queryOptions.isKudu_transaction()) {
              :           FeKuduTable kuduTable = (FeKuduTable) targetTable;
              :           KuduClient client = KuduUtil.getKuduClient(kuduTable.getKuduMasterHosts());
              :           KuduTransaction txn = null;
              :           try {
              :             // Open Kudu transaction.
              :             txn = client.newTransaction();
              :             timeline.markEvent("Kudu transaction opened with query id: "
              :                 + queryCtx.getQuery_id().toString());
              :             insertStmt.setTransactionToken(txn.serialize());
              :             kuduTxnManager_.addTransaction(queryCtx.getQuery_id(), txn);
              :             queryCtx.setIs_kudu_transactional(true);
              :           } catch (IOException e) {
              :             if (txn != null) txn.close();
              :             throw new TransactionException(e.getMessage());
              :           }
              :         }
This covers insert/createTableAsSelect stmts which is very nice. 

Just wonder if we need to worry about other IUD cases, such as delete, or update.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java
File fe/src/main/java/org/apache/impala/service/JniFrontend.java:

http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java@772
PS1, Line 772: TUniqueId queryId
Would it be possible to pass in a kuduTransaction object direct here?  This helps remove the restriction of one to one mapping of query id to transaction.


http://gerrit.cloudera.org:8080/#/c/17553/1/fe/src/main/java/org/apache/impala/service/JniFrontend.java@782
PS1, Line 782: TUniqueId queryId
Same comment as above.



-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 1
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Comment-Date: Mon, 07 Jun 2021 21:14:59 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Wenzhe Zhou (Code Review)" <ge...@cloudera.org>.
Wenzhe Zhou has uploaded a new patch set (#21). ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................

IMPALA-10557: Support Kudu's multi-row transaction

Kudu added multi-row transaction so Impala could run query that inserts
multiple rows into Kudu's table in the context of a single transaction.
Kudu provides new Java/C++ client APIs to open/commit/rollback
transaction, create session with transaction, serialize/deserialize
metadata of transaction object. Kudu transaction object has built-in
heartbeater.

This patch added Impala support to use Kudu's multiple-row transaction.
 - Added a new query option to enable Kudu's transaction.
 - When the query option is set, a new Kudu transaction should be
   started for "insert", "CTAS" and "UPDATE/UPSERT/DELETE" statements
   by Impala's frontend of coordinator.
 - The Kudu transaction objects are kept in KuduTransactionManager until
   the transactions are going to be aborted or committed.
 - Frontend serialize the transaction metadata into a transaction token
   and pass to executors.
 - Executors deserialize the transaction token and ingest via that
   transaction handle. For Kudu session in the context of a transaction,
   return the first error if there are any pending errors for the Kudu
   session so that the Kudu transaction will be aborted.
   Since Kudu does not support transaction for "UPDATE/UPSERT/DELETE"
   statements now, Kudu returns error which causes transaction to be
   aborted.
 - Coordinator commits the transaction if everything goes well.
   Otherwise, aborts the transaction.

Also changed code to store KuduClient as shared pointer since KuduClient
has to be passed as a shared pointer when KuduTransaction::Deserialize()
is called.

Testing:
 - Added new e-to-e tests for Kudu transaction.
 - Passed core test.

Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
---
M be/src/exec/kudu-scan-node-base.h
M be/src/exec/kudu-table-sink.cc
M be/src/exec/kudu-table-sink.h
M be/src/exprs/kudu-partition-expr.cc
M be/src/exprs/kudu-partition-expr.h
M be/src/runtime/exec-env.cc
M be/src/runtime/exec-env.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/frontend.cc
M be/src/service/frontend.h
M be/src/service/query-options.cc
M be/src/service/query-options.h
M common/thrift/DataSinks.thrift
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M fe/src/main/java/org/apache/impala/analysis/DeleteStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/ModifyStmt.java
M fe/src/main/java/org/apache/impala/analysis/UpdateStmt.java
A fe/src/main/java/org/apache/impala/common/KuduTransactionManager.java
M fe/src/main/java/org/apache/impala/common/TransactionException.java
M fe/src/main/java/org/apache/impala/planner/KuduTableSink.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/service/JniFrontend.java
M testdata/cluster/node_templates/common/etc/kudu/master.conf.tmpl
M testdata/cluster/node_templates/common/etc/kudu/tserver.conf.tmpl
M tests/custom_cluster/test_kudu.py
29 files changed, 794 insertions(+), 67 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/53/17553/21
-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>

[Impala-ASF-CR] IMPALA-10557: Support Kudu's multi-row transaction

Posted by "Bikramjeet Vig (Code Review)" <ge...@cloudera.org>.
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/17553 )

Change subject: IMPALA-10557: Support Kudu's multi-row transaction
......................................................................


Patch Set 21: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17553
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I876ada48991afdff5d61b5d6a0417571aba7cb34
Gerrit-Change-Number: 17553
Gerrit-PatchSet: 21
Gerrit-Owner: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Bikramjeet Vig <bi...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <im...@cloudera.com>
Gerrit-Reviewer: Qifan Chen <qc...@cloudera.com>
Gerrit-Reviewer: Wenzhe Zhou <wz...@cloudera.com>
Gerrit-Comment-Date: Thu, 24 Jun 2021 14:32:13 +0000
Gerrit-HasComments: No