You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Hao Hao (Code Review)" <ge...@cloudera.org> on 2020/10/25 02:19:05 UTC

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Hao Hao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16648


Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

(wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

wip: add more tests.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
13 files changed, 398 insertions(+), 22 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 15: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:       FLAGS_txn_status_tablet_inject_load_failure_error = true;
> You can also use --env_inject_eio_globs to target specific directories. IIR
You mentioned on slack that this won't do io unless we flush. It's probably more trouble than it's worth to use io errors here.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 06:40:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 6:

(27 comments)

A bunch of nits but overall looking much better!

http://gerrit.cloudera.org:8080/#/c/16648/5/src/2.patch
File src/2.patch:

PS5: 
Remove this.


http://gerrit.cloudera.org:8080/#/c/16648/5/src/3.patch
File src/3.patch:

PS5: 
Remove this.


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/integration-tests/txn_status_manager-itest.cc
File src/kudu/integration-tests/txn_status_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/integration-tests/txn_status_manager-itest.cc@447
PS5, Line 447:   SleepFor(MonoDelta::FromMilliseconds(5 * keepalive_interval_ms));
Do we need this if we're repeating in ASSERT_EVENTUALLY?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744
PS6, Line 744: ;
nit: drop extra semicolon


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751
PS6, Line 751: threads.emplace_back([&] {
What does the multithreading add here? Seems like it complicates things since we'd have to worry about monotonically increasing transaction IDs.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758
PS6, Line 758:         if (s.ok()) {
What failures are we expecting?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748
PS6, Line 748:   std::atomic<int64_t> next_thread_id = 0;
             :   vector<int> txn_id_index = { 1, 6, 11, 16, 21};
             :   for (int t = 0; t < 5; t++) {
             :     threads.emplace_back([&] {
             :       int thread_id = next_thread_id++;
             :       CHECK_LT(thread_id, txn_id_index.size());
             :       int start_txn_id = txn_id_index[thread_id];
             :       for (int i = start_txn_id; i < start_txn_id + 5; i++) {
             :         TxnStatusEntryPB txn_status;
             :         Status s = txn_sys_client_->BeginTransaction(i, kUser);
             :         if (s.ok()) {
             :           // Make sure a re-election happens before the following read.
             :           SleepFor(MonoDelta::FromMilliseconds(3 * FLAGS_raft_heartbeat_interval_ms));
             :           s = txn_sys_client_->GetTransactionStatus(i, kUser, &txn_status);
             :           CHECK_OK(s);
             :           CHECK(txn_status.has_user());
             :           CHECK_STREQ(kUser, txn_status.user().c_str());
             :           CHECK(txn_status.has_state());
             :           CHECK_EQ(TxnStatePB::OPEN, txn_status.state());
             :         }
             :       }
             :     });
             :   }
nit: would be easier to read as:

const int kTxnsPerThread = 5;
const int kNumThreads = 5;
vector<threads> threads;
for (int t = 0; t < kNumThreads; t++) {
  threads.emplace_back([t, &] {
    for (int i = 0; i < kNumTxnsPerThreads; i++) {
      // do stuff on txn (t * kNumTxnsPerThreads + i)
    }
  }
}


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@395
PS5, Line 395: "Received notification of TxnStatusTablet state change "
             :                              << "but the raft consensus is not initialized. Tablet ID: "
             :                              << tablet_id << ". Reason: " << reason;
nit: use Substitute. Same below.


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@407
PS5, Line 407:   <<
nit: spacing alignment


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/tablet_replica.cc@997
PS5, Line 997: state_ == STOPPING || state_ == STOPPED
I'm curious whether the code would work if we reused the != RUNNING condition. If so, we could use the same function at both callsites.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403
PS6, Line 403: << 
nit: maybe add a message here too indicating what's going on


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472
PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down or shutdown. State: $0",
             :                                TabletStatePB_Name(state_));
nit: might be cleaner to let the callers of these otherwise simple functions handle logging.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@979
PS6, Line 979:   if (!txn_coordinator_) {
             :     return false;
             :   }
nit: maybe only call this if txn_coordinator_ is set and change this to a DCHECK? Same below.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@997
PS6, Line 997: state_ == STOPPING || state_ == STOPPED) {
nit: I'm curious, why not reuse the != RUNNING condition from above? A comment would be nice.


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/txn_coordinator.h
File src/kudu/tablet/txn_coordinator.h:

http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/tablet/txn_coordinator.h@50
PS5, Line 50: It's about reload tablet metadata 
nit: how about "This task loads the contents of the tablet into memory and does other work..."


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@109
PS6, Line 109: but some operations
             :     // may still be legal.
I might be missing something but is this fact used anywhere? What operations are still legal?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@106
PS6, Line 106:     const Status& replica_status() const { return replica_status_; }
             : 
             :     // Leadership status of the transaction status manager. If not OK, the
             :     // transaction status manager is not the leader, but some operations
             :     // may still be legal.
             :     const Status& leader_status() const {
             :       return leader_status_;
             :     }
nit: Are these ever used?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@118
PS6, Line 118: if (!replica_status_.ok()) {
             :         return replica_status_;
             :       }
             :       return leader_status_;
nit: a bit simpler as

 RETURN_NOT_OK(replica_status_);
 return leader_status_;


http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/5/src/kudu/transactions/txn_status_manager.h@230
PS5, Line 230:   // data is loaded and the replic
nit: maybe call it LoadFromTabletForTests() then


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83
PS6, Line 83: If this time is exceeded, the "
            :              "node crashes."
If the consequence is so severe, maybe we should set this to be something like 5 minutes? Or, rather than crashing, why not just return an error to the client? Or would it be better to just wait?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@217
PS6, Line 217: leader_ready_term != initial_term_ ||
             :                     !leader_shared_lock_.owns_lock())
nit: Curious, doesn't this mean that we've changed leaders? The error message doesn't seem to reflect that.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@235
PS6, Line 235: NOT_THE_LEADER
Should this only be set if leader_status_ is set?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@320
PS6, Line 320: RETURN_NOT_OK
nit: just return this?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@369
PS6, Line 369:   }
nit: maybe log something about beginning to wait?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@414
PS6, Line 414: failed
nit: "interrupted" would be less alarming, given this is somewhat normal


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@430
PS6, Line 430: Substitute("Tablet")
nit: remove


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc@1428
PS6, Line 1428: RunTx
nit: maybe name these ShouldRun...

Run...Task makes it seem like the functions actually run the task, rather than determine whether we should run.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 6
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 12 Jan 2021 23:52:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@725
PS3, Line 725:     FLAGS_leader_failure_max_missed_heartbeat_periods = 1;
             :     FLAGS_raft_heartbeat_interval_ms = 30;
I'm curious: do this settings provide stable behavior under TSAN/ASAN?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@749
PS3, Line 749: (int t = 0; t < 5; t++)
Any idea how many re-elections happen while these transaction are being started?  In other words, maybe make every thread starting several transactions in succession, spaced by some interval (e.g., 3 * raft_heartbeat_interval_ms) for better coverage?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@756
PS3, Line 756:           ASSERT_OK(s);
             :           ASSERT_TRUE(txn_status.has_user());
             :           ASSERT_STREQ(kUser, txn_status.user().c_str());
             :           ASSERT_TRUE(txn_status.has_state());
             :           ASSERT_EQ(TxnStatePB::OPEN, txn_status.state());
I'm curious how does this scenario behave when any of this fails?  How to join the threads in such case?

I remember one of the approaches to address that is either using CHECK instead of ASSERT, so the test crashes, or simply increment failure count and then report on those once every thread is complete.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 06 Jan 2021 19:12:35 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 9:

(7 comments)

> Patch Set 9:
> 
> (7 comments)
> 
> IWYU isn't happy yet:
> 
> >>> Fixing #includes in '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc'
> @@ -32,6 +32,7 @@
>  #include "kudu/common/wire_protocol.h"
>  #include "kudu/consensus/metadata.pb.h"
>  #include "kudu/consensus/raft_consensus.h"
> +#include "kudu/gutil/casts.h"
>  #include "kudu/gutil/macros.h"
>  #include "kudu/gutil/map-util.h"
>  #include "kudu/gutil/port.h"
> IWYU would have edited 1 files on your behalf.

Thanks a lot for looking into the failure!

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304
PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished();
> How do we protect against the race when right after this 'txn_coordinator_t
Are you asking after L304 which we waited 'txn_coordinator_task_counter_' to become zero. and then it again becomes non-zero and tablet is shutdown state. what happens if TxnStatusManager tries to access the tablet? In that case,  the task will not be enabled/ran when the tablet is shutting down. So that ShouldRunTxnCoordinatorStalenessTask() will return false and no new tasks will run.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471
PS9, Line 471:   if (state_ == STOPPING || state_ == STOPPED) {
             :     return false;
             :   }
             :   return true;
> This looks a bit strange: it returns 'true' even if 'state_' is RUNNING?
Ah, good catch. Sorry it is the opposite meaning as what the method name implies here. Adjust the caller as well.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997
PS9, Line 997: state_
> What about SHUTDOWN state?
SHUTDOWN is only set after the tablet is fully stopped. Which the raft consensus has been stopped and the call back should no longer be triggered.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009
PS9, Line 1009: txn_coordinator_task_counter_
> nit: is it expected that this counter goes negative?  If not, maybe add CHE
Done


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101
PS9, Line 101: txn_coordinator
> nit: it would be great to add a comment about the expectations about the ty
Done


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439
PS9, Line 1439: !r->CheckRunning().ok()
> Does it mean that the TxnStalenessTrackerTask() will cease to run at this t
Hmm, yeah, I guess we wouldn't want to stop the TxnStalenessTrackerTask() if some txn status tablets is shutdown. We may want to resume on checking the other tablets.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: continue
> Maybe, we should jump not just to the next element 'replicas' container, bu
I am not sure why we want to do that? Why don't we check for the rest tablets?



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 19 Jan 2021 01:15:04 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:       TxnStatusManager* txn_status_manager =
              :           dynamic_cast<TxnStatusManager*>(coordinator);
              :       TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager);
> Sounds doable for me, but wondering what is the rationale behind to perform
The rationale is to prevent code duplication: as I could see, all call sites which use TxnStatusManager::ScopedLeaderSharedLock contain this dynamic_cast piece, so why not to provide a constructor with proper signature.

Feel free to ignore this if you think it's not worth it.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 20:42:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:       FLAGS_txn_status_tablet_inject_load_failure_error = true;
> You mentioned on slack that this won't do io unless we flush. It's probably
Ack



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 06:53:16 +0000
Gerrit-HasComments: Yes

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 3:

(13 comments)

Mostly nits, but there are a few places that seem a little suspicious w.r.t locking.

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/master/txn_manager-test.cc
File src/kudu/master/txn_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/master/txn_manager-test.cc@137
PS3, Line 137:  // Find the replica.
             :       tablet_replica_ = LookupTabletReplica();
nit: spacing

Also why is this necessary? Doesn't the master hold a reference to this replica? Is this meant to protect against a race?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@285
PS3, Line 285:   {
Do we need to take the state change lock here? Is it OK if we proceed while racing with shutdown?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@286
PS3, Line 286:     std::unique_lock<simple_spinlock> lock(lock_);
nit: since we're not using 'lock', we can just use std::lock_guard<> here


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@312
PS3, Line 312:     // Submit to pool
             :     if (shutdown_latch_->count() == 0) {
             :      return;
             :     }
Do we still need this if we take the state change lock and hold it for the duration of this function?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@877
PS3, Line 877: RegisterTxnCoordinator
nit: this doesn't seem to be "registering" anything per se. Maybe call this MarkTxnCoordinatorRegistered() instead?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@887
PS3, Line 887: tablet_
nit: tablet_id()?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tablet/tablet_replica.cc@887
PS3, Line 887: "Not registering transaction coordinator for " << tablet_
             :                  << ": tablet not in RUNNING state";
nit: use Substitute() for better readability. Also since this isn't _not_ registering, so maybe instead say "Registering transaction coordinator while tablet is not RUNNING" or somesuch.


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.h@110
PS3, Line 110: specified
nit: I might've missed this; where is this specified?


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc@204
PS3, Line 204:  
nit: drop space


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc@356
PS3, Line 356:   auto cleanup = MakeScopedCleanup([&]() {
             :     status_tablet_.tablet_replica_->UpdateTxnCoordinatorReloadTask(false);
             :   });
nit: if we don't intend on cancelling this, we don't need to define a variable for it, and we can just use the SCOPED_CLEANUP macro instead


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/transactions/txn_status_manager.cc@434
PS3, Line 434:       if (!check([this]() { return this->LoadFromTabletUnlocked(); },
             :                  *consensus, term, kLoadMetaOpDescription).ok()) {
             :         return;
             :       }
nit: Why bother putting this in a lambda? IMO it makes it more difficult to read, since readers have to reason about a couple layers of function calls here.


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tserver/ts_tablet_manager.cc@1426
PS3, Line 1426: if (r->txn_coordinator_registered()) {
It seems like there's still room for a TOCTOU race here if right after this check, we deleted the tablet.


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/tserver/ts_tablet_manager.cc@1447
PS3, Line 1447:      auto cleanup = MakeScopedCleanup([&]() {
              :         r->UpdateTxnCoordinatorStalenessTask(false);
              :       });
nit: SCOPED_CLEANUP here too



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 05 Jan 2021 23:42:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 9:

(7 comments)

IWYU isn't happy yet:

>>> Fixing #includes in '/home/jenkins-slave/workspace/kudu-master/2/src/kudu/transactions/txn_status_manager.cc'
@@ -32,6 +32,7 @@
 #include "kudu/common/wire_protocol.h"
 #include "kudu/consensus/metadata.pb.h"
 #include "kudu/consensus/raft_consensus.h"
+#include "kudu/gutil/casts.h"
 #include "kudu/gutil/macros.h"
 #include "kudu/gutil/map-util.h"
 #include "kudu/gutil/port.h"
IWYU would have edited 1 files on your behalf.

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@304
PS9, Line 304: WaitUntilTxnCoordinatorTasksFinished();
How do we protect against the race when right after this 'txn_coordinator_task_counter_' becomes non-zero again and tablet_->Shutdown() is called when TxnStatusManager tries to access the tablet?

Should we just nullify txn_coordinator_ in WaitUntilTxnCoordinatorTasksFinished() or set another variable that controls the return status of ShouldRunTxnCoordinatorStalenessTask() along with 'state_'?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@471
PS9, Line 471:   if (state_ == STOPPING || state_ == STOPPED) {
             :     return false;
             :   }
             :   return true;
This looks a bit strange: it returns 'true' even if 'state_' is RUNNING?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@997
PS9, Line 997: state_
What about SHUTDOWN state?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tablet/tablet_replica.cc@1009
PS9, Line 1009: txn_coordinator_task_counter_
nit: is it expected that this counter goes negative?  If not, maybe add CHECK()/DCHECK() to enforce the invariant here?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/transactions/txn_status_manager.h@101
PS9, Line 101: txn_coordinator
nit: it would be great to add a comment about the expectations about the type of this parameter.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1439
PS9, Line 1439: !r->CheckRunning().ok()
Does it mean that the TxnStalenessTrackerTask() will cease to run at this tablet server once one of its txn status tablets is shutdown (e.g., moved by other server by the rebalancer tool)?  I'm not sure that's the behavior we want.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: continue
Maybe, we should jump not just to the next element 'replicas' container, but onto next iteration of the outer 'while (true)' loop?



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 16 Jan 2021 06:10:15 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has removed a vote on this change.

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:       FLAGS_txn_status_tablet_inject_load_failure_error = true;
> Try to use --env_inject_eio but that seems to affect the log to be appended
You can also use --env_inject_eio_globs to target specific directories. IIRC, each mini server stores all of its data in a single directory, so something like

--env_inject_eio_globs={cluster_->mini_tablet_server(i)->server()->fs_manager()->GetDataRootDirs()[0]}/**



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 01:18:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 14:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc
File src/kudu/integration-tests/txn_status_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc@447
PS14, Line 447:   SleepFor(MonoDelta::FromMilliseconds(5 * keepalive_interval_ms));
nit: Why do we need to sleep for so long if we're already retrying?


http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443
PS14, Line 443: 
              :       // In all other cases non-OK status is considered fatal.
              :       LOG(FATAL) << Substitute("$0 failed: $1", op_description, s.ToString());
              :       __builtin_unreachable();
Why might we get here? I thought we didn't want to crash at all? Can we get here if there's a disk failure or somesuch?



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 19:53:39 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#14).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 833 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/14
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#3).

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

(wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

wip: txn_status_manager-itest is flaky. Add more comments.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
16 files changed, 763 insertions(+), 146 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 13:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc@818
PS13, Line 818:   // Make sure a re-election happens before the following read.
Is this guaranteed to happen? If we need to change leadership, quiescing might be an approach. Look around for 'mutable_quiescing()' in some existing tests.


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/master/txn_manager-test.cc
File src/kudu/master/txn_manager-test.cc:

PS13: 
Why the changes in this file?


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@214
PS13, Line 214: PREDICT_FALSE(!replica_status_.ok()) ||
              :         PREDICT_FALSE(FLAGS_txn_status_tablet_inject_uninitialized_leader_status_error)
nit: wrap in a single PREDICT_FALSE


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@400
PS13, Line 400: !s.ok() || PREDICT_FALSE
nit: maybe wrap the entire thing in PREDICT_FALSE instead of just the flag



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 29 Jan 2021 08:43:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Reviewed-on: http://gerrit.cloudera.org:8080/16648
Tested-by: Hao Hao <ha...@cloudera.com>
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 850 insertions(+), 140 deletions(-)

Approvals:
  Hao Hao: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 16
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has removed a vote on this change.

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#9).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 760 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/9
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@a482
PS2, Line 482: 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
             : 
Can we at least check that we have acquired the leader lock?


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@724
PS2, Line 724: auto* consensus = status_tablet_.tablet_replica_->consensus();
             :         DCHECK(consensus);
nit: can combine as:

 auto* consensus = DCHECK_NOTNULL(...consensus());


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc@1420
PS2, Line 1420: {
              :       shared_lock<RWMutex> l(lock_);
              :       for (const auto& elem : tablet_map_) {
              :         auto r = elem.second;
              :         // Find txn status tablet replicas.
              :         if (r->txn_coordinator()) {
              :           replicas.emplace_back(std::move(r));
              :         }
              :       }
              :     }
              :     for (auto& r : replicas) {
              :       if (shutdown_latch_.count() == 0) {
              :         return;
              :       }
              :       auto* coordinator = DCHECK_NOTNULL(r->txn_coordinator());
It may be worth considering updating this to look more like how we schedule maintenance ops. E.g. rather than iterating through each replica and trying to check for coordinator, instead when we create a replica that is a coordinator, "register" that coordinator with the TSTabletManager when it's fully started and ready.

Conceptually, the problems at hand seems similar, that we want to run tasks using a shared threadpool, and that the tasks are tied to the our replicas, and the replica shutting down shouldn't collide with the task running. Take a look at util/maintenance_manager.cc to see if a similar approach might make sense here, e.g. when shutting down a replica, if an abort task is in progress, we wait for it to complete, and then proceed with the shutdown. That might also help avoid races with this task and the initial call to TabletReplica::Start().



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 24 Dec 2020 00:08:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 7:

(4 comments)

Yep, I agree with Andrew: after quick looks this looks much better to me now!

I hope to get one more pass tomorrow morning.  As for now, just a few nits not related to the essence, but rather syntax and code conventions.

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc@426
PS6, Line 426: dynamic_cast
nit: I guess we prefer to use down_cast<> for this type of casts in the code; see src/kudu/gutil/casts.h for details


http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@725
PS3, Line 725:     FLAGS_leader_failure_max_missed_heartbeat_periods = 1;
             :     FLAGS_raft_heartbeat_interval_ms = 30;
> Yes, ran looped the test in TSAN and didn't see failure caused by this. Is 
The short heartbeat interval combined with just single missing heartbeat period prompted me to make sure this test eventually converges when run under ASAN/TSAN with --stress_cpu_threads=16 or alike.  But since you verified it works as needed, it's great!


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc@1262
PS7, Line 1262: dynamic_cast
nit: use down_cast<> here as well?


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:       TxnStatusManager* txn_status_manager =
              :           dynamic_cast<TxnStatusManager*>(coordinator);
              :       TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager);
Looking at this again and again prompts me to ask the following question: Is it possible to introduce a constructor of ScopedLeaderSharedLock with a parameter of TxnCoordinator* type?

The idea is to perform the downcast (using down_cast or dynamic_cast) under the hood, and then report an error via fist_failed_status() in debug mode if the coordinator could not be downcasted to TxnStatusManager?

What do you think?



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 08:34:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 12:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@329
PS11, Line 329: transaction status table and is 
> nit: just "transaction status table"
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@332
PS11, Line 332: More concretely, if this returns
              :   // 'true', callers must call DecreaseTxnCoordinatorTaskCounter().
              :   //
> nit: more concretely, if this returns 'true', callers must call DecreaseTxn
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@336
PS11, Line 336: 
> nit: these methods don't run anything, so how about "if the caller should r
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@340
PS11, Line 340: transaction status table
              :   // and is
> nit: here too
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc@1010
PS11, Line 1010: std::l
> nit: could use DCHECK_GE
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h@100
PS11, Line 100: 'txn_coordinator' must be a 'TxnStatusManager*' because of the
              :     // downcast in the initia
> nit: more specifically, 'txn_coordinator' must be a TxnStatusManager* becau
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83
PS6, Line 83:  false,
            :             "Finalize commit
> Sounds good. Curious if we have test coverage for this.
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@239
PS11, Line 239: 
> Do we have tests for this?
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@356
PS11, Line 356: 
> nit: why not put this directly in the if() condition? Same with other insta
Done


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@358
PS11, Line 358: t get an unexpected error response and bail instead 
> nit: this might not be that useful. It'll be obvious that the tablet is shu
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 12
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 29 Jan 2021 02:32:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 11:

(10 comments)

Overall looks good, just more nits and a couple of test suggestions.

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h
File src/kudu/tablet/tablet_replica.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@329
PS11, Line 329: transaction status manager table
nit: just "transaction status table"


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@332
PS11, Line 332: Thus, the caller needs to call
              :   // 'DecreaseTxnCoordinatorTaskCounter' after the task is executed or
              :   // aborted to unblock the tablet shutting down process.
nit: more concretely, if this returns 'true', callers must call DecreaseTxnCoordinatorTaskCounter(). Same below.


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@336
PS11, Line 336: successfully
nit: these methods don't run anything, so how about "if the caller should run the staleness task"

Same below


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.h@340
PS11, Line 340: transaction status manager
              :   // table 
nit: here too


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/tablet/tablet_replica.cc@1010
PS11, Line 1010: DCHECK
nit: could use DCHECK_GE


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.h@100
PS11, Line 100: The expect type of 'txn_coordinator' is a subclass of TxnCoordinator,
              :     // e.g. TxnStatusManager.
nit: more specifically, 'txn_coordinator' must be a TxnStatusManager* because of the down_cast, no?


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83
PS6, Line 83: ing all operations replicated "
            :              "by the previou
> After thinking more, I decide to remove the crash logic and just return, si
Sounds good. Curious if we have test coverage for this.


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@239
PS11, Line 239: TABLET_NOT_RUNNING);
Do we have tests for this?


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@356
PS11, Line 356: status_tablet_.tablet_replica_->IsShuttingDown();
nit: why not put this directly in the if() condition? Same with other instances with task_enabled


http://gerrit.cloudera.org:8080/#/c/16648/11/src/kudu/transactions/txn_status_manager.cc@358
PS11, Line 358: "The tablet is already shutting down or shutdown. ";
nit: this might not be that useful. It'll be obvious that the tablet is shutting down from surrounding messages. This should probably just say "Not loading TxnStatusManager" or something



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 21 Jan 2021 07:24:11 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:       FLAGS_txn_status_tablet_inject_load_failure_error = true;
> Do disk failures exercise this codepath? If so, we can just use --env_injec
Try to use --env_inject_eio but that seems to affect the log to be appended. Or maybe you have suggestions on how to narrow down to only affect transaction status tablet?



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 01:13:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 14: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443
PS14, Line 443: 
              :       // In all other cases non-OK status is considered fatal.
              :       LOG(FATAL) << Substitute("$0 failed: $1", op_description, s.ToString());
              :       __builtin_unreachable();
> Why might we get here? I thought we didn't want to crash at all? Can we get
Actually I'd be ok addressing this as a follow-up.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 20:38:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has removed a vote on this change.

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@714
PS9, Line 714:       orig_leader
> nit: I guess this isn't necessary once the ASSERT_EVENTUALLY() below is gon
Done


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@715
PS9, Line 715: TONED, /
> nit: just put 3 here?
Address it in a follow up patch.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: nStatusM
> Because the l.first_failed_status().ok() most likely will return not-OK for
Hmm, makes sense. I will address it in a follow up commit.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 16
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 01 Feb 2021 22:57:56 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#12).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 825 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/12
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 12
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#7).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 760 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/7
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 14:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/integration-tests/txn_status_table-itest.cc@818
PS13, Line 818: class TxnStatusTableElectionStormITest : public TxnStatusTableITest {
> Is this guaranteed to happen? If we need to change leadership, quiescing mi
Done


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/master/txn_manager-test.cc
File src/kudu/master/txn_manager-test.cc:

PS13: 
> Why the changes in this file?
Came across this code and saw some nits can be improved (but I guess also introduced some redundant dependency). Will revert it back.


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@214
PS13, Line 214: PREDICT_FALSE(!replica_status_.ok() ||
              :                       FLAGS_txn_status_tablet_inject_uninitialized_leader_status_error)
> nit: wrap in a single PREDICT_FALSE
Done


http://gerrit.cloudera.org:8080/#/c/16648/13/src/kudu/transactions/txn_status_manager.cc@400
PS13, Line 400: PREDICT_FALSE(!s.ok() ||
> nit: maybe wrap the entire thing in PREDICT_FALSE instead of just the flag
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 14
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 08:21:54 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 15: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/15/src/kudu/integration-tests/txn_status_table-itest.cc@774
PS15, Line 774:       FLAGS_txn_status_tablet_inject_load_failure_error = true;
Do disk failures exercise this codepath? If so, we can just use --env_inject_eio.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 00:58:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@154
PS1, Line 154:                          this->TxnStatusReplicaStateChanged(this->tablet_id(), reason);
Do we also need to call cb() here?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@147
PS1, Line 147:       reload_txn_status_tablet_pool_(reload_txn_status_tablet_pool),
             :       txn_coordinator_(meta_->table_type() &&
             :                        *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ?
             :                        DCHECK_NOTNULL(txn_coordinator_factory)->Create(this) : nullptr),
             :       mark_dirty_clbk_(meta_->table_type() &&
             :                        *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ?
             :                        [this](const string& reason) {
             :                          this->TxnStatusReplicaStateChanged(this->tablet_id(), reason);
             :                        } : std::move(cb)),
> Maybe, at this point it's time to separate TxnStatusTabletReplica info a su
FWIW I think this would be more plumbing that it's worth.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302
PS1, Line 302: leader_cb
nit: why does this need to be a lambda?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@189
PS1, Line 189: //
nit: remove extra slashes?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@347
PS1, Line 347:       return s; // unreachable
nit: you can use __builtin_unreachable()


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@357
PS1, Line 357: check
nit: This gets used once -- why does it need to be a lambda?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@1238
PS1, Line 1238:   tablet_copy_pool_->Shutdown();
We should to shut down our new pool here.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 27 Oct 2020 19:59:55 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 8:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:       TxnStatusManager* txn_status_manager =
              :           dynamic_cast<TxnStatusManager*>(coordinator);
              :       TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager);
> The rationale is to prevent code duplication: as I could see, all call site
Sounds good to me, will update it.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 21:02:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#15).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
17 files changed, 850 insertions(+), 140 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/15
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 16:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@714
PS9, Line 714:       orig_leader
nit: I guess this isn't necessary once the ASSERT_EVENTUALLY() below is gone.


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/integration-tests/txn_status_table-itest.cc@715
PS9, Line 715: TONED, /
nit: just put 3 here?


http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/9/src/kudu/tserver/ts_tablet_manager.cc@1457
PS9, Line 1457: nStatusM
> I am not sure why we want to do that? Why don't we check for the rest table
Because the l.first_failed_status().ok() most likely will return not-OK for the rest of the tables: l.first_failed_status().ok() doesn't depend on the elements in the container, but rather on the status of the replica behind this TxnStatusManager.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 16
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 17:11:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#6).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 769 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 6
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#10).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 766 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/10
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 10
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#8).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 760 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/8
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 8:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc
File src/kudu/client/client-test.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/client/client-test.cc@426
PS6, Line 426: dynamic_cast
> nit: I guess we prefer to use down_cast<> for this type of casts in the cod
Done


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/tablet_service.cc@1262
PS7, Line 1262: dynamic_cast
> nit: use down_cast<> here as well?
Done


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:       TxnStatusManager* txn_status_manager =
              :           dynamic_cast<TxnStatusManager*>(coordinator);
              :       TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager);
> Sounds good to me, will update it.
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 15 Jan 2021 05:19:06 +0000
Gerrit-HasComments: Yes

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 3:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@154
PS1, Line 154:                        *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ?
> Do we also need to call cb() here?
Yeah, good point, I don't see reasons not to,  added it.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@147
PS1, Line 147:       cmeta_manager_(DCHECK_NOTNULL(std::move(cmeta_manager))),
             :       local_peer_pb_(std::move(local_peer_pb)),
             :       log_anchor_registry_(new LogAnchorRegistry()),
             :       apply_pool_(apply_pool),
             :       reload_txn_status_tablet_pool_(reload_txn_status_tablet_pool),
             :       shutdown_latch_(shutdown_latch),
             :       txn_coordinator_(meta_->table_type() &&
             :                        *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ?
             :                        DCHECK_NOTNULL(txn_
> FWIW I think this would be more plumbing that it's worth.
Agree with Andrew on this. Alexey, let me know if you think otherwise


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302
PS1, Line 302: PREFIX(WA
> nit: why does this need to be a lambda?
Just want to be clear this is for leadership change cb, but I guess it is a bit redundant. Removed.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302
PS1, Line 302:     LOG_WITH_PREFIX(WARNIN
> Just curious: is this going to be called if leadership status hasn't change
Right, this is going to be called for any leadership status change, even though the former leader can wins the latest election again. As we discussed offline, it is not clear what the performance implication will be, so adding a TODO in case we think optimization of not reloading the metadata for this case.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h@95
PS1, Line 95:     explicit ScopedLeaderSharedLock(TxnStatusManager* txn_status);
> warning: invalid case style for parameter 'txn_status_' [readability-identi
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@62
PS1, Line 62: DEFINE_int32(txn_status_manager_inject_latency_load_from_tablet_ms, 0,
> warning: using decl 'CoordinateTransactionResponsePB' is unused [misc-unuse
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@83
PS1, Line 83: 
> nit: construction
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@189
PS1, Line 189:   
> nit: remove extra slashes?
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@300
PS1, Line 300:   if (consensus->role() != RaftPeerPB::LEADER) {
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@324
PS1, Line 324:   RETURN_NOT_OK(status_tablet_.tablet_replica_->op_tracker()->WaitForAllToFinish(timeout));
> warning: the parameter 'func' is copied for each invocation but only used a
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@347
PS1, Line 347:     return Status::NotAuthoriz
> nit: you can use __builtin_unreachable()
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@357
PS1, Line 357: table
> nit: This gets used once -- why does it need to be a lambda?
Hmm, since 'check' expects 'std::function<Status()>', what do you suggest instead of using lambda?


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@202
PS2, Line 202:     return;
> What happens if an instance of this lock is being constructed and the table
Right, updated the code to ensure the lock is only hold for tablet replica that is fully started as Andrew suggested.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@386
PS1, Line 386: disks
> nit: data directories ?
Done


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@389
PS1, Line 389:   RETURN_NOT_OK(ThreadPoolBuilder("tablet-delete")
             :                 .set_max_threads(max_delete_threads)
             :                 .Build(&delete_tablet_pool_));
> Is it necessary to gracefully shutdown this pool along with other tablet po
Yeah, added the shutdown along with the other pools.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@1238
PS1, Line 1238:     std::lock_guard<RWMutex> loc
> We should to shut down our new pool here.
Done


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/tserver/ts_tablet_manager.cc@1420
PS2, Line 1420: vector<scoped_refptr<TabletReplica>> replicas;
              :     {
              :       shared_lock<RWMutex> l(lock_);
              :       for (const auto& elem : tablet_map_) {
              :         auto r = elem.second;
              :         // Find txn status tablet replicas.
              :         if (r->txn_coordinator_registered()) {
              :           replicas.emplace_back(std::move(r));
              :         }
              :       }
              :     }
              :     for (auto& r : replicas) {
              :       if (shutdown_latch_.count() == 0) {
              :         return;
              :       }
> It may be worth considering updating this to look more like how we schedule
Makes sense, updated the code accordingly. Now txn_status_tablet-itest seems to be longer flaky, http://dist-test.cloudera.org/job?job_id=hao.hao.1609829456.46540, however, txn_status_manager-itest is still flaky, investigating it.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 05 Jan 2021 07:01:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/integration-tests/txn_status_table-itest.cc@756
PS2, Line 756:           ASSERT_OK(s);
             :           ASSERT_TRUE(txn_status.has_user());
             :           ASSERT_STREQ(kUser, txn_status.user().c_str());
             :           ASSERT_TRUE(txn_status.has_state());
             :           ASSERT_EQ(TxnStatePB::OPEN, txn_status.state());
Does this work as expected when an assertion triggers?  I remember there were issues with that, so in this sort of construction we tended to use CHECK_ equivalents.


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/integration-tests/txn_status_table-itest.cc@764
PS2, Line 764: std::for_each(threads.begin(), threads.end(), [&] (thread& t) { t.join(); });
If leave those ASSERTs above (i.e. not converting them into CHECKs), this should be put into a scoped cleanup, right?  Otherwise, if an assertion above triggers, the threads would not be joined?


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/master/txn_manager-test.cc
File src/kudu/master/txn_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/master/txn_manager-test.cc@150
PS2, Line 150:  
nit: remove the extra space


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/master/txn_manager-test.cc@151
PS2, Line 151: cluster_->mini_tablet_server(0)->server()->tablet_manager()->GetTabletReplicas(&replicas)
I'm not sure this waits or assumes any leadership status, saw GetTabletReplicas() simply calls AppendValuesFromMap() under the hood, no?


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/master/txn_manager-test.cc@161
PS2, Line 161: tablet_replica_
What tablet replica is it and where is it used besides TearDown() and Start()?


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.h@95
PS2, Line 95: txn_status
nit: txn_status_manager


http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/2/src/kudu/transactions/txn_status_manager.cc@202
PS2, Line 202:   const string& uuid = txn_status_manager_->status_tablet_.tablet_replica_->permanent_uuid();
What happens if an instance of this lock is being constructed and the tablet replica is reset upon calling  https://gerrit.cloudera.org/#/c/16648/2/src/kudu/tablet/tablet_replica.cc@211 TSTabletManager::CreateNewTablet() ?


This has triggered the following TSAN warnings:

WARNING: ThreadSanitizer: data race (pid=972)                                   
  Write of size 8 at 0x7b5400056548 by thread T152 (mutexes: write M634861673190352280, write M58672
    #0 std::__1::enable_if<(is_move_constructible<kudu::tablet::Tablet*>::value) && (is_move_assigna
    #1 std::__1::shared_ptr<kudu::tablet::Tablet>::swap(std::__1::shared_ptr<kudu::tablet::Tablet>&)
    #2 std::__1::shared_ptr<kudu::tablet::Tablet>::operator=(std::__1::shared_ptr<kudu::tablet::Tabl
    #3 kudu::tablet::TabletReplica::Start(kudu::consensus::ConsensusBootstrapInfo const&, std::__1::
    #4 kudu::tserver::TSTabletManager::OpenTablet(scoped_refptr<kudu::tablet::TabletReplica> const&,
    #5 kudu::tserver::TSTabletManager::CreateNewTablet(std::__1::basic_string<char, std::__1::char_t
    #6 decltype(std::__1::forward<kudu::tserver::TSTabletManager::CreateNewTablet(std::__1::basic_st
    #7 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::tserver::TSTabletManager::Cre
    #8 std::__1::__function::__alloc_func<kudu::tserver::TSTabletManager::CreateNewTablet(std::__1::
    #9 std::__1::__function::__func<kudu::tserver::TSTabletManager::CreateNewTablet(std::__1::basic_
    #10 std::__1::__function::__value_func<void ()>::operator()() const /data/1/hao/Repositories/kud
    #11 std::__1::function<void ()>::operator()() const /data/1/hao/Repositories/kudu/thirdparty/ins
    #12 kudu::ThreadPool::DispatchThread() /data/1/hao/Repositories/kudu/src/kudu/util/threadpool.cc
    #13 kudu::ThreadPool::CreateThread()::$_1::operator()() const /data/1/hao/Repositories/kudu/src/
    #14 decltype(std::__1::forward<kudu::ThreadPool::CreateThread()::$_1&>(fp)()) std::__1::__invoke
    #15 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::ThreadPool::CreateThread()::
    #16 std::__1::__function::__alloc_func<kudu::ThreadPool::CreateThread()::$_1, std::__1::allocato
    #17 std::__1::__function::__func<kudu::ThreadPool::CreateThread()::$_1, std::__1::allocator<kudu
    #18 std::__1::__function::__value_func<void ()>::operator()() const /data/1/hao/Repositories/kud
    #19 std::__1::function<void ()>::operator()() const /data/1/hao/Repositories/kudu/thirdparty/ins
    #20 kudu::Thread::SuperviseThread(void*) /data/1/hao/Repositories/kudu/src/kudu/util/thread.cc:6
                                                                                
  Previous read of size 8 at 0x7b5400056548 by thread T150 (mutexes: read M586447839757432840):
    #0 std::__1::shared_ptr<kudu::tablet::Tablet>::operator->() const /data/1/hao/Repositories/kudu/
    #1 kudu::tablet::TabletReplica::permanent_uuid() const /data/1/hao/Repositories/kudu/src/kudu/ta
    #2 kudu::transactions::TxnStatusManager::ScopedLeaderSharedLock::ScopedLeaderSharedLock(kudu::tr
    #3 kudu::tserver::TSTabletManager::TxnStalenessTrackerTask() /data/1/hao/Repositories/kudu/src/k
    #4 kudu::tserver::TSTabletManager::Init()::$_8::operator()() const /data/1/hao/Repositories/kudu
    #5 decltype(std::__1::forward<kudu::tserver::TSTabletManager::Init()::$_8&>(fp)()) std::__1::__i
    #6 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::tserver::TSTabletManager::Ini
    #7 std::__1::__function::__alloc_func<kudu::tserver::TSTabletManager::Init()::$_8, std::__1::all
    #8 std::__1::__function::__func<kudu::tserver::TSTabletManager::Init()::$_8, std::__1::allocator
    #9 std::__1::__function::__value_func<void ()>::operator()() const /data/1/hao/Repositories/kudu
    #10 std::__1::function<void ()>::operator()() const /data/1/hao/Repositories/kudu/thirdparty/ins
    #11 kudu::ThreadPool::DispatchThread() /data/1/hao/Repositories/kudu/src/kudu/util/threadpool.cc
    #12 kudu::ThreadPool::CreateThread()::$_1::operator()() const /data/1/hao/Repositories/kudu/src/
    #13 decltype(std::__1::forward<kudu::ThreadPool::CreateThread()::$_1&>(fp)()) std::__1::__invoke
    #14 void std::__1::__invoke_void_return_wrapper<void>::__call<kudu::ThreadPool::CreateThread()::
    #15 std::__1::__function::__alloc_func<kudu::ThreadPool::CreateThread()::$_1, std::__1::allocato
    #16 std::__1::__function::__func<kudu::ThreadPool::CreateThread()::$_1, std::__1::allocator<kudu
    #17 std::__1::__function::__value_func<void ()>::operator()() const /data/1/hao/Repositories/kud
    #18 std::__1::function<void ()>::operator()() const /data/1/hao/Repositories/kudu/thirdparty/ins
    #19 kudu::Thread::SuperviseThread(void*) /data/1/hao/Repositories/kudu/src/kudu/util/thread.cc:6



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 23 Dec 2020 21:45:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 6:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@744
PS6, Line 744: ;
> nit: drop extra semicolon
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@751
PS6, Line 751: threads.emplace_back([&] {
> What does the multithreading add here? Seems like it complicates things sin
The point is to ensure concurrent txn reads/writes have no failures with frequent leader elections. Add the comment to be clear.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@758
PS6, Line 758:         if (s.ok()) {
> What failures are we expecting?
Add a comment


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/integration-tests/txn_status_table-itest.cc@748
PS6, Line 748:   std::atomic<int64_t> next_thread_id = 0;
             :   vector<int> txn_id_index = { 1, 6, 11, 16, 21};
             :   for (int t = 0; t < 5; t++) {
             :     threads.emplace_back([&] {
             :       int thread_id = next_thread_id++;
             :       CHECK_LT(thread_id, txn_id_index.size());
             :       int start_txn_id = txn_id_index[thread_id];
             :       for (int i = start_txn_id; i < start_txn_id + 5; i++) {
             :         TxnStatusEntryPB txn_status;
             :         Status s = txn_sys_client_->BeginTransaction(i, kUser);
             :         if (s.ok()) {
             :           // Make sure a re-election happens before the following read.
             :           SleepFor(MonoDelta::FromMilliseconds(3 * FLAGS_raft_heartbeat_interval_ms));
             :           s = txn_sys_client_->GetTransactionStatus(i, kUser, &txn_status);
             :           CHECK_OK(s);
             :           CHECK(txn_status.has_user());
             :           CHECK_STREQ(kUser, txn_status.user().c_str());
             :           CHECK(txn_status.has_state());
             :           CHECK_EQ(TxnStatePB::OPEN, txn_status.state());
             :         }
             :       }
             :     });
             :   }
> nit: would be easier to read as:
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@403
PS6, Line 403: << 
> nit: maybe add a message here too indicating what's going on
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@472
PS6, Line 472: LOG(WARNING) << Substitute("The tablet is already shutting down or shutdown. State: $0",
             :                                TabletStatePB_Name(state_));
> nit: might be cleaner to let the callers of these otherwise simple function
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@979
PS6, Line 979:   if (!txn_coordinator_) {
             :     return false;
             :   }
> nit: maybe only call this if txn_coordinator_ is set and change this to a D
I updated the name to reflect the functionality which is ShouldRunTxnCoordinatorStalenessTask(). So keep it as it is here.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tablet/tablet_replica.cc@997
PS6, Line 997: state_ == STOPPING || state_ == STOPPED) {
> nit: I'm curious, why not reuse the != RUNNING condition from above? A comm
Added a comment.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@109
PS6, Line 109: but some operations
             :     // may still be legal.
> I might be missing something but is this fact used anywhere? What operation
Sorry this is not used, removed it.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@106
PS6, Line 106:     const Status& replica_status() const { return replica_status_; }
             : 
             :     // Leadership status of the transaction status manager. If not OK, the
             :     // transaction status manager is not the leader, but some operations
             :     // may still be legal.
             :     const Status& leader_status() const {
             :       return leader_status_;
             :     }
> nit: Are these ever used?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.h@118
PS6, Line 118: if (!replica_status_.ok()) {
             :         return replica_status_;
             :       }
             :       return leader_status_;
> nit: a bit simpler as
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@83
PS6, Line 83: If this time is exceeded, the "
            :              "node crashes."
> If the consequence is so severe, maybe we should set this to be something l
After thinking more, I decide to remove the crash logic and just return, since node crash seems to be too severe for timeout on OpTracker::WaitForAllToFinish(). Now, if timeout, it will trigger the client to get a ServiceUnavailable error and they can retry until eventually succeed (or fail). Let me know if you think otherwise.


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@217
PS6, Line 217: leader_ready_term != initial_term_ ||
             :                     !leader_shared_lock_.owns_lock())
> nit: Curious, doesn't this mean that we've changed leaders? The error messa
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@235
PS6, Line 235: NOT_THE_LEADER
> Should this only be set if leader_status_ is set?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@320
PS6, Line 320: RETURN_NOT_OK
> nit: just return this?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@369
PS6, Line 369:   }
> nit: maybe log something about beginning to wait?
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@414
PS6, Line 414: failed
> nit: "interrupted" would be less alarming, given this is somewhat normal
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/transactions/txn_status_manager.cc@430
PS6, Line 430: Substitute("Tablet")
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/6/src/kudu/tserver/ts_tablet_manager.cc@1428
PS6, Line 1428: RunTx
> nit: maybe name these ShouldRun...
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 6
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 07:31:25 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 11: Verified+1

TSAN build failure doesn't seem to relate to the patch.


-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 19 Jan 2021 18:06:06 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#11).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 766 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/11
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#13).

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
18 files changed, 825 insertions(+), 144 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/13
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 1:

(6 comments)

Just a quick first pass.

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc
File src/kudu/tablet/tablet_replica.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@147
PS1, Line 147:       reload_txn_status_tablet_pool_(reload_txn_status_tablet_pool),
             :       txn_coordinator_(meta_->table_type() &&
             :                        *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ?
             :                        DCHECK_NOTNULL(txn_coordinator_factory)->Create(this) : nullptr),
             :       mark_dirty_clbk_(meta_->table_type() &&
             :                        *meta_->table_type() == TableTypePB::TXN_STATUS_TABLE ?
             :                        [this](const string& reason) {
             :                          this->TxnStatusReplicaStateChanged(this->tablet_id(), reason);
             :                        } : std::move(cb)),
Maybe, at this point it's time to separate TxnStatusTabletReplica info a sub-class of TabletReplica, and instantiating one ore another in TSTabletManager::CreateAndRegisterTabletReplica() based on meta_->table_type()?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tablet/tablet_replica.cc@302
PS1, Line 302:     CHECK_OK(leader_cb());
Just curious: is this going to be called if leadership status hasn't changed for a replica, i.e. if the former leader wins latest election round again?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h
File src/kudu/transactions/txn_status_manager.h:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.h@87
PS1, Line 87: class ScopedLeaderSharedLock {
Would it make sense to extract the common part of the code from CatalogManager::ScopedLeaderSharedLock and this class into some common class (maybe, using template class approach) and re-use the common code in catalog manager and here?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/transactions/txn_status_manager.cc@83
PS1, Line 83: constrution
nit: construction

Also, add a stop in the end of the sentence.


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@386
PS1, Line 386: disks
nit: data directories ?


http://gerrit.cloudera.org:8080/#/c/16648/1/src/kudu/tserver/ts_tablet_manager.cc@389
PS1, Line 389:   RETURN_NOT_OK(ThreadPoolBuilder("txn-status-tablet-reload")
             :                 .set_max_threads(max_reload_threads)
             :                 .Build(&reload_txn_status_tablet_pool_));
Is it necessary to gracefully shutdown this pool along with other tablet pools uses in TsTabletManager or it's OK to rely on its destructor's logic as is?



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 26 Oct 2020 23:14:43 +0000
Gerrit-HasComments: Yes

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#2).

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

(wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

wip: fix the flaky tests.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
16 files changed, 617 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16648

to look at the new patch set (#5).

Change subject: (wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................

(wip) KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Currently, even though a non-leader TxnStatusManager will be unable to
write to the underlying table (in the Raft subsystem the write would be
aborted), we may want to restrict calls to be made by the leader
TxnStatusManagers only. The motivation is to provide a more robust system,
which avoids cases when the request was sent to a laggy follower, we may
end up failing the request with an error.

This patch introduces ScopedLeaderSharedLock (similar to the one in Catalog
Manager) to be used to ensure the requests were sent to leaders only and
to block all other operations while reloading the persistent transaction
status metadata upon leadership changes. Note that during failover the
leader replica will wait until in-flight ops in the previous consensus
term to be applied before reloading the metadata.

Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
---
A src/2.patch
A src/3.patch
M src/kudu/client/client-test.cc
M src/kudu/integration-tests/ts_tablet_manager-itest.cc
M src/kudu/integration-tests/txn_status_manager-itest.cc
M src/kudu/integration-tests/txn_status_table-itest.cc
M src/kudu/master/sys_catalog.cc
M src/kudu/master/txn_manager-test.cc
M src/kudu/tablet/tablet_replica-test-base.cc
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tablet/tablet_replica.h
M src/kudu/tablet/txn_coordinator.h
M src/kudu/transactions/txn_status_manager-test.cc
M src/kudu/transactions/txn_status_manager.cc
M src/kudu/transactions/txn_status_manager.h
M src/kudu/transactions/txn_status_tablet.h
M src/kudu/tserver/tablet_copy_source_session-test.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
20 files changed, 2,011 insertions(+), 145 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/48/16648/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 5
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 13: Verified+1

Unrelated flaky test ExternalMiniClusterTest.TestBasicOperation


-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 13
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 29 Jan 2021 06:52:16 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 15:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc
File src/kudu/integration-tests/txn_status_manager-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/integration-tests/txn_status_manager-itest.cc@447
PS14, Line 447:   ASSERT_EVENTUALLY([&]{
> nit: Why do we need to sleep for so long if we're already retrying?
Done


http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc
File src/kudu/transactions/txn_status_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/14/src/kudu/transactions/txn_status_manager.cc@443
PS14, Line 443: 
              :       const int64_t term = consensus.CurrentTerm();
              :       // If the term has changed we assume the new leader is about to do the
              :       // necessary work in its
> Why might we get here? I thought we didn't want to crash at all? Can we get
Good point! It is overkilled to crash here if there is any failures preventing read from the transaction status tablet. Updated.



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sat, 30 Jan 2021 23:51:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 15: Verified+1

Unrelated org.apache.kudu.client.TestKuduMetrics flaky test failure.


-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 15
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Sun, 31 Jan 2021 00:31:39 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2612: restrict TxnStatusManager calls to be made by the leader only

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/16648 )

Change subject: KUDU-2612: restrict TxnStatusManager calls to be made by the leader only
......................................................................


Patch Set 8:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc
File src/kudu/integration-tests/txn_status_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/16648/3/src/kudu/integration-tests/txn_status_table-itest.cc@725
PS3, Line 725:     FLAGS_leader_failure_max_missed_heartbeat_periods = 1;
             :     FLAGS_raft_heartbeat_interval_ms = 30;
> The short heartbeat interval combined with just single missing heartbeat pe
Yes, when I looped I was using --stress_cpu_threads=16.


http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/16648/7/src/kudu/tserver/ts_tablet_manager.cc@1453
PS7, Line 1453:       TxnStatusManager* txn_status_manager =
              :           dynamic_cast<TxnStatusManager*>(coordinator);
              :       TxnStatusManager::ScopedLeaderSharedLock l(txn_status_manager);
> Looking at this again and again prompts me to ask the following question: I
Sounds doable for me, but wondering what is the rationale behind to perform the downcast under the hood vs. doing it here?



-- 
To view, visit http://gerrit.cloudera.org:8080/16648
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I42c1ad095dcb4bdffcbe0ecf9631a60bac208c2a
Gerrit-Change-Number: 16648
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 14 Jan 2021 19:56:43 +0000
Gerrit-HasComments: Yes