You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Alexey Serbin (Code Review)" <ge...@cloudera.org> on 2020/12/03 04:38:13 UTC

[kudu-CR] [txn] update not-the-leader retry logic in TxnSystemClient

Alexey Serbin has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16815


Change subject: [txn] update not-the-leader retry logic in TxnSystemClient
......................................................................

[txn] update not-the-leader retry logic in TxnSystemClient

This patch updates the retry logic in TxnSystemClient to handle
the case when TxnStatusManager returns Status::ServiceUnavailable().

In addition, this patch adds extra provision to check for tablet replica
status in StatusManager::CheckTxnStatusDataLoadedUnlocked().
The signatures of the involved methods have been updated as well.

This patch also enhances FinalizeCommitTransaction() to report
on errors from underlying tablet, if any.

I didn't add new tests scenarios in this patch: they are present in
follow-up patches which introduce the tracking of staled transactions.

Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
---
M src/kudu/client/client-test.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/coordinator_rpc.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 148 insertions(+), 65 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/16815/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 1
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc
File src/kudu/transactions/coordinator_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc@188
PS2, Line 188:   if (result.status.IsAborted()) {
> +1
I guess that was supposed to be handled in the code below with TabletServerErrorPB::NOT_THE_LEADER.

Handling IllegalState() was a major point of pain here because TxnStatusManager returns IllegalState() if transaction is not in right state, and just assume that that's the leader change is not correct.

It seems I need to clarify on this: probably, that means we'll need to introduce error codes specific to TxnStatusManager interface.



-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 02:59:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc
File src/kudu/transactions/coordinator_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc@188
PS2, Line 188:   if (result.status.IsAborted()) {
> I guess that was supposed to be handled in the code below with TabletServer
Right, I am running into similar issues on the TxnParticipant side. I think one approach would be to unwrap resp.error().code() early, selecting an appropriate result as below, and if not, then check the error status, the observation being that error codes are much more fine-grained than Statuses. E.g. for the TxnStatusManager, in addition to returning IllegalState(), we could set ts_error to TXN_ILLEGAL_STATE.



-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 03:22:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 06:06:38 +0000
Gerrit-HasComments: No

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................

[txn] update not-a-leader retry logic in TxnSystemClient

This patch updates the retry logic in TxnSystemClient to handle
the case when TxnStatusManager returns Status::ServiceUnavailable().

In addition, this patch adds extra provision to check for tablet replica
status in StatusManager::CheckTxnStatusDataLoadedUnlocked().
The signatures of the involved methods have been updated as well.

This patch also enhances FinalizeCommitTransaction() to report
on errors from underlying tablet, if any.  Also, TXN_ILLEGAL_STATE error
code is added into CoordinateTransactionResponsePB responses when
TxnStatusManager finds the requested transaction in a wrong state.

I didn't add new tests scenarios in this patch: they are present in
follow-up patches which introduce the tracking of stalled transactions.

Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Reviewed-on: http://gerrit.cloudera.org:8080/16815
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Hao Hao <ha...@cloudera.com>
Tested-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/client/client-test.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/coordinator_rpc.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 178 insertions(+), 74 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Hao Hao: Looks good to me, approved
  Alexey Serbin: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 6
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16815

to look at the new patch set (#5).

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................

[txn] update not-a-leader retry logic in TxnSystemClient

This patch updates the retry logic in TxnSystemClient to handle
the case when TxnStatusManager returns Status::ServiceUnavailable().

In addition, this patch adds extra provision to check for tablet replica
status in StatusManager::CheckTxnStatusDataLoadedUnlocked().
The signatures of the involved methods have been updated as well.

This patch also enhances FinalizeCommitTransaction() to report
on errors from underlying tablet, if any.  Also, TXN_ILLEGAL_STATE error
code is added into CoordinateTransactionResponsePB responses when
TxnStatusManager finds the requested transaction in a wrong state.

I didn't add new tests scenarios in this patch: they are present in
follow-up patches which introduce the tracking of stalled transactions.

Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
---
M src/kudu/client/client-test.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/coordinator_rpc.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 178 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/16815/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16815/3/src/kudu/transactions/coordinator_rpc.cc
File src/kudu/transactions/coordinator_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/16815/3/src/kudu/transactions/coordinator_rpc.cc@211
PS3, Line 211:         DLOG(FATAL) << "unhandled code: " << TabletServerErrorPB::Code_Name(code);
> Should we fall through to the more general error handling code below? E.g. 
Good catch!

Done.



-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 06:02:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc
File src/kudu/transactions/coordinator_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc@188
PS2, Line 188:   if (result.status.IsAborted()) {
> Right, I am running into similar issues on the TxnParticipant side. I think
OK, I added corresponding code into TxnStatusManager.


http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc@194
PS2, Line 194: TABLET_NOT_FOUND
> nit: maybe move this comment down by TABLET_NOT_FOUND?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 05:04:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16815

to look at the new patch set (#4).

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................

[txn] update not-a-leader retry logic in TxnSystemClient

This patch updates the retry logic in TxnSystemClient to handle
the case when TxnStatusManager returns Status::ServiceUnavailable().

In addition, this patch adds extra provision to check for tablet replica
status in StatusManager::CheckTxnStatusDataLoadedUnlocked().
The signatures of the involved methods have been updated as well.

This patch also enhances FinalizeCommitTransaction() to report
on errors from underlying tablet, if any.

I didn't add new tests scenarios in this patch: they are present in
follow-up patches which introduce the tracking of stalled transactions.

Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
---
M src/kudu/client/client-test.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/coordinator_rpc.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 179 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/16815/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 4
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 2: Verified+1

unrelated test failure in ClientStressTest.TestStartScans


-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Fri, 04 Dec 2020 04:26:01 +0000
Gerrit-HasComments: No

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16815/3/src/kudu/transactions/coordinator_rpc.cc
File src/kudu/transactions/coordinator_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/16815/3/src/kudu/transactions/coordinator_rpc.cc@211
PS3, Line 211:         DLOG(FATAL) << "unhandled code: " << TabletServerErrorPB::Code_Name(code);
Should we fall through to the more general error handling code below? E.g. for UNKNOWN_ERRORs?



-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 05:21:07 +0000
Gerrit-HasComments: Yes

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 07:40:19 +0000
Gerrit-HasComments: No

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 2:

(1 comment)

Overall looks good to me.

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc
File src/kudu/transactions/coordinator_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc@188
PS2, Line 188:   if (result.status.IsAborted()) {
> Should we also be handling IllegalState? E.g. if we change leaders as we ar
+1



-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 01:19:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16815

to look at the new patch set (#2).

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................

[txn] update not-a-leader retry logic in TxnSystemClient

This patch updates the retry logic in TxnSystemClient to handle
the case when TxnStatusManager returns Status::ServiceUnavailable().

In addition, this patch adds extra provision to check for tablet replica
status in StatusManager::CheckTxnStatusDataLoadedUnlocked().
The signatures of the involved methods have been updated as well.

This patch also enhances FinalizeCommitTransaction() to report
on errors from underlying tablet, if any.

I didn't add new tests scenarios in this patch: they are present in
follow-up patches which introduce the tracking of staled transactions.

Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
---
M src/kudu/client/client-test.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/coordinator_rpc.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 148 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/16815/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 5: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 06:34:57 +0000
Gerrit-HasComments: No

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 5: Verified+1

I'll take care of the issue reported by ASAN separately -- it's not exactly related to the retrial logic, as far as I can see.


-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Tue, 08 Dec 2020 07:44:05 +0000
Gerrit-HasComments: No

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16815 )

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc
File src/kudu/transactions/coordinator_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc@188
PS2, Line 188:   if (result.status.IsAborted()) {
Should we also be handling IllegalState? E.g. if we change leaders as we are writing and fail in CheckLeadershipAndBindTerm?


http://gerrit.cloudera.org:8080/#/c/16815/2/src/kudu/transactions/coordinator_rpc.cc@194
PS2, Line 194: TABLET_NOT_FOUND
nit: maybe move this comment down by TABLET_NOT_FOUND?



-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 2
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Comment-Date: Mon, 07 Dec 2020 06:52:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Andrew Wong, Hao Hao, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16815

to look at the new patch set (#3).

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................

[txn] update not-a-leader retry logic in TxnSystemClient

This patch updates the retry logic in TxnSystemClient to handle
the case when TxnStatusManager returns Status::ServiceUnavailable().

In addition, this patch adds extra provision to check for tablet replica
status in StatusManager::CheckTxnStatusDataLoadedUnlocked().
The signatures of the involved methods have been updated as well.

This patch also enhances FinalizeCommitTransaction() to report
on errors from underlying tablet, if any.

I didn't add new tests scenarios in this patch: they are present in
follow-up patches which introduce the tracking of stalled transactions.

Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
---
M src/kudu/client/client-test.cc
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/coordinator_rpc.cc
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/tserver/tablet_service.cc
7 files changed, 175 insertions(+), 74 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/15/16815/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 3
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] [txn] update not-a-leader retry logic in TxnSystemClient

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has removed a vote on this change.

Change subject: [txn] update not-a-leader retry logic in TxnSystemClient
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16815
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I1b8abb27c7678c5c616a325343620902f6cbfd59
Gerrit-Change-Number: 16815
Gerrit-PatchSet: 5
Gerrit-Owner: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)