You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Hao Hao (Code Review)" <ge...@cloudera.org> on 2021/03/08 08:07:47 UTC

[kudu-CR] (wip) KUDU-2612: acquire and release PartitionLock

Hao Hao has uploaded this change for review. ( http://gerrit.cloudera.org:8080/17159


Change subject: (wip) KUDU-2612: acquire and release PartitionLock
......................................................................

(wip) KUDU-2612: acquire and release PartitionLock

This patch introduces the logic for acquiring and releasing the
PartitionLock at ParticipantOp for BEGIN_TXN, FINALIZE_COMMIT
and ABORT_TXN.

wip: move code for ScopedPartitionLock to the previous patch,
     add acquire and release PartitionLock at WriteOp,
     add tests.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/tablet/lock_manager.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
6 files changed, 91 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/1
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc
File src/kudu/tablet/lock_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc@401
PS1, Line 401: }
> For consistency, maybe add a check that other != this ?
Will update it in the other CR.



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 11 Mar 2021 18:57:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 11: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 14 Apr 2021 01:32:40 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 8:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc
File src/kudu/integration-tests/txn_participant-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813
PS3, Line 813:     auto resp_error = StatusFromPB(resp.error().status());
> This is marked 'Done' but I don't see that a scenario with calling Particip
Done as TxnParticipantITest.TestTxnSystemClientRepeatCalls


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930
PS3, Line 930: // TODO(awong): update this when impleme
> nit: this is marked as 'Done', and I saw TODO is added in other places.  Is
Done


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc
File src/kudu/integration-tests/txn_participant-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc@282
PS5, Line 282:         ASSERT_EQ(txns, replicas[i]->tablet()->txn_participant()->GetTxnsForTests());
             :       });
             :     }
> nit: is this change essential?  If not, maybe it's better to return back to
Ah I added this because the list of transactions can be quite long, and a difference in count is a bit easier to decipher and think about than comparing the full lists that get printed out. I can revert it -- it was more useful for debugging and can be added back if we ever need to debug this further.


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h
File src/kudu/tablet/ops/participant_op.h:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h@75
PS3, Line 75: ors the given 'op_id
> Can this method be invoked as a part of larger operation which might have a
I'm going to punt on this. I think there'd be a decent amount of plumbing to get the client's deadline, and it'd also be for leaders only. I added a TODO instead.


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc@212
PS5, Line 212: "START. Timestamp: $0", clock::HybridClock::GetPhysicalValueMicros(state_->times
> How to release the partition lock if the following code kicks in when the d
If replication fails, the ScopedPartitionLock is dropped, and if that was the last reference to the partition lock, the lock is released.


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h@172
PS5, Line 172:   // Acquire a shared lock on the given transaction, to ensure the
             :   // transaction's state doesn't change while the given write is in flight.
             :   Status AcquireTxnLock(int64_t txn_id, WriteOpState* op_state);
> BTW, why do we need this method when we already have a similar one for Writ
IIRC from my discussion with Hao, doing this in the BEGIN_TXN lets us catch races earlier. OTOH, thinking about it more, we ought to instead have the goal of taking the lock for as little time as possible, so I've gone and updated this so we only take the lock for writes.


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@165
PS3, Line 165: If 'must_acquire' is true,
             :   // then wait until the lock is acquired
> Write operations are executed in the context of an RPC which have a timeout
The op will proceed past the timeout. It might be avoidable, but I'm not sure it's a dealbreaker. At least, outside of the dispatcher, we don't have checks along the write path for timeouts to exit early, and at least in the context of the OpDrivers, I don't think it's worth the fuss for a single op, given we have similarly timed acquires on the write path already.


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/txn_participant.cc
File src/kudu/tablet/txn_participant.cc:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/txn_participant.cc@41
PS5, Line 41: namespace kudu {
> warning: using decl 'Substitute' is unused [misc-unused-using-decls]
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 07 Apr 2021 07:48:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc
File src/kudu/tablet/ops/write_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574
PS9, Line 574:     CHECK(!s.ok()) << s.ToString();
             :     completio
> I guess DCHECK() wouldn't trigger in case of s.ok() because there is LOG(DF
Ah that's true. I'll change this to a CHECK to be a bit more conservative, and given it's easily verifiable today from this code snippet at least.



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:28:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has uploaded a new patch set (#9) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into write ops for transactional and
non-transactional operations. Upon attempting to write, we try to
acquire the partition lock in the WriteOp prepare phase, and upon
successfully applying the op, we transfer ownership of the lock to the
Txn that the write was a part of (or just release the partition lock if
the write was non-transactional). Txns release the lock when
FINALIZE_COMMIT or ABORT_TXN is applied.

We take the partition lock for non-transactional write ops as well to
ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and
then a non-transactional write inserts key=1 before the transaction
commits). If the partition lock cannot be acquired, the write op is
aborted or retried, based on the wait-die mechanics already in the lock
manager.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
15 files changed, 604 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/9
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 4:

(11 comments)

Addressed partial comments, will try to address the rest later.

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530
PS3, Line 530:     //                For example, Status::IllegalState() originated from
             :     //                TabletServerErrorPB::TXN_ILLEGAL_STATE response and
             : 
> +1
No, this is not about TXN_LOCKED_RETRY. It is about error handling of TxnOpDispatcher. IIUC, TxnOpDispatcher currently only wraps the status of the call but not the error code. So TXN_LOCKED_ABORT is retried even it should not be (and handled by L510) for transactional ones. But for transactional ones it is taking care by L510.

For TXN_LOCKED_RETRY_OP we return it with ServiceUnavailable Status. so now it is handled in L473. But added the specific checking there.


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc
File src/kudu/integration-tests/txn_participant-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813
PS3, Line 813: TEST_F(TxnParticipantITest, TestTxnSystemClientBeginTxnLockAbort) {
> Do you mind adding a scenario with calling ParticipateInTransaction() for t
Done


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@833
PS3, Line 833: s.IsAborted
> Hmm, interesting: what actor is to abort the transaction?  I'm curious abou
Sure, will add a comment. Now TxnManager will just pass the error back to the client, which will cause the op to fail. And the client need to figure out what to do with the error (instead of TxnManager silently abort the transaction). Or do you have other suggestions?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930
PS3, Line 930: FLAGS_enable_txn_partition_lock = false;
> It would be great if you could add a note explaining that this scenario bec
Done


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc
File src/kudu/integration-tests/txn_write_ops-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc@1046
PS3, Line 1046: 
> nit: for completeness sake, maybe commit `second_txn` and ensure we no long
Done


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc@123
PS3, Line 123:   }
> Probably, we want to distinguish in the trace whether this is was a no-op o
Done


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc
File src/kudu/tablet/ops/write_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@67
PS3, Line 67: DEFINE_int32(tablet_inject_latency_on_apply_write_op_ms, 0,
            :              "How much latency to inject when a write op is applied. "
            :              "For testing only!");
            : TAG_FLAG(tablet_inject_latency_on_apply_write_op_ms, un
> Could we just reuse --enable_txn_partition_lock? When might we want to set 
Done


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@569
PS3, Line 569:     }
> LOG(DFATAL) or LOG(FATAL)?
Done


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@166
PS3, Line 166: return
             :   // 'TXN_LOCKED_ABORT' or 'TXN_LOCKED_RETRY_OP' error
> How is it to return those if the return type of this method is Status?
Ah, sorry for not being precise. Will update the comment.


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h
File src/kudu/tablet/txn_participant.h:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h@220
PS3, Line 220:     partition_lock_ = std::move(partition_lock);
> nit: maybe DCHECK or CHECK that we own the lock?
Done


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc
File src/kudu/transactions/participant_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc@180
PS3, Line 180:       case TabletServerErrorPB::TXN_ILLEGAL_STATE
> Can we also explicitly handle TXN_LOCKED_RETRY_OP?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 4
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 12 Mar 2021 08:02:59 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has uploaded a new patch set (#8) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into write ops for transactional and
non-transactional operations. Upon attempting to write, we try to
acquire the partition lock in the WriteOp prepare phase, and upon
successfully applying the op, we transfer ownership of the lock to the
Txn that the write was a part of (or just release the partition lock if
the write was non-transactional). Txns release the lock when
FINALIZE_COMMIT or ABORT_TXN is applied.

We take the partition lock for non-transactional write ops as well to
ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and
then a non-transactional write inserts key=1 before the transaction
commits). If the partition lock cannot be acquired, the write op is
aborted or retried, based on the wait-die mechanics already in the lock
manager.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
16 files changed, 604 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/8
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 8
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has uploaded a new patch set (#6) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into participant and write ops for
transactional and non-transactional operations. For each transaction, we
try to acquire the partition lock in the ParticipantOp prepare phase of
BEGIN_TXN and release the lock when COMMIT_TXN or ABORT_TXN is applied.

Moreover, we take the partition lock for non-transactional write ops as
well to ensure we don’t have duplicate keys. If the partition lock
cannot be acquired, the transaction (or write op) is aborted or retried.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
16 files changed, 552 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/6
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 6
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 11: Verified+1

Test failure is unrelated: MiniRangerTest.TestGrantPrivilege

Seems to be an issue with starting up the mini cluster.


-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 14 Apr 2021 00:19:59 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has removed a vote on this change.

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] (wip) KUDU-2612: acquire and release PartitionLock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: (wip) KUDU-2612: acquire and release PartitionLock
......................................................................


Patch Set 1:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG@13
PS1, Line 13: wip: 
We'll also need to make sure the TxnOpDispatcher handles the error cases well


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@95
PS1, Line 95:  else if (code == TabletServerErrorPB::TXN_LOCKED_RETRY_OP) {
nit: based on the current code, this cannot be any other code, right? If so, maybe just use an else {} and DCHECK on the code?


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@199
PS1, Line 199: state_->tablet_replica()
nit: can use 'replica'


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@200
PS1, Line 200:   case ParticipantOpPB::BEGIN_COMMIT:
Shouldn't there be a 'break' before this line? Or are we taking the partition lock for each of these ops?



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Mon, 08 Mar 2021 19:52:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 3:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@7
PS3, Line 7: acquire and release partition lock
Do you plan to add an end-to-end test scenario into client.cc or alike to verify how this works if looking from the client side?


http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@13
PS3, Line 13: parition
partition


http://gerrit.cloudera.org:8080/#/c/17159/3//COMMIT_MSG@16
PS3, Line 16: will be aborted or retried
Do you mind extending this a bit to explain what actor is to abort or retry corresponding transaction/write op?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530
PS3, Line 530:     //                Status::Abort() originated from TabletServerErrorPB::TXN_LOCKED_ABORT
             :     //                response are needlessly retried.
             : 
> Isn't this handled above? Or do you mean TXN_LOCKED_RETRY? We'd probably be
+1

If that's about TXN_LOCKED_RETRY, I agree that it's better to handle TXN_LOCKED_RETRY as a specific error code than generic IsIllegalState()


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc
File src/kudu/integration-tests/txn_participant-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813
PS3, Line 813: TEST_F(TxnParticipantITest, TestTxnSystemClientBeginTxnLockAbort) {
Do you mind adding a scenario with calling ParticipateInTransaction() for the same txn_id multiple times?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@833
PS3, Line 833: s.IsAborted
Hmm, interesting: what actor is to abort the transaction?  I'm curious about the bigger picture: I was under impression that it's the TxnManager who takes care of that, right?  Could you add a comment clarifying that at this point kSecondTxn isn't aborted yet (well, it's not even started, actually, right)?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930
PS3, Line 930: FLAGS_enable_txn_partition_lock = false;
It would be great if you could add a note explaining that this scenario became a fully synthetic one with the introduction of coarse-grain locking.  It might be a TODO reminding to remove the override for the flag once fine-grained locking appears.


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h
File src/kudu/tablet/ops/participant_op.h:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.h@75
PS3, Line 75: AcquirePartitionLock
Can this method be invoked as a part of larger operation which might have a deadline?   If so, should this method take an optional 'deadline' parameter as well?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/participant_op.cc@123
PS3, Line 123:   TRACE("Released partition lock");
Probably, we want to distinguish in the trace whether this is was a no-op or actual release of the lock, no?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@165
PS3, Line 165: If 'must_acquire' is true,
             :   // then wait until the lock is acquired
Write operations are executed in the context of an RPC which have a timeout.  What happens if a lock is still being acquired but the operation has timed out?  Could this end up in accumulating too many ops in the queue?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/tablet.h@166
PS3, Line 166: return
             :   // 'TXN_LOCKED_ABORT' or 'TXN_LOCKED_RETRY_OP' error
How is it to return those if the return type of this method is Status?



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 11 Mar 2021 22:11:50 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17159

to look at the new patch set (#3).

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch introduces the logic for actual acquiring and releasing
the partition lock for transactions and non-transactional operations.
For each transaction, we try to acquire the partition lock in the
ParticipantOp prepare phase of BEGIN_TXN and release the lock when
COMMIT_TXN or ABORT_TXN is applied. Moreover, we take the parition
lock for non-transactional operations as well to ensure we don’t
have duplicate keys. If the partition lock cannot be acquired,
the transaction (or write op) will be aborted or retried.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
12 files changed, 376 insertions(+), 13 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/3
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 9:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc
File src/kudu/integration-tests/txn_write_ops-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc@352
PS9, Line 352: error.IsTimedOut()
> nit: could you add a comment explaining why both IsAborted() and IsTimedOut
Done


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h
File src/kudu/tablet/ops/write_op.h:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h@179
PS9, Line 179: locks
> nit: lock?
Done


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc
File src/kudu/tablet/ops/write_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@217
PS9, Line 217: rows
> nit: row
Done


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574
PS9, Line 574:     completion_callback()->set_error(s, code);
             :     return s;
> nit: does it make sense to add CHECK(!s.ok()) before setting the error code
Done as a DCHECK


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc@525
PS9, Line 525:     Status s = Write(0);
             :     ASSERT_TRUE(s.IsAborted()) << s.ToString();
             :     ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another");
             : 
             :     s = Write(0, kTxnTwo);
             :     ASSERT_TRUE(s.IsAborted()) << s.ToString();
             :     ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another");
> nit: is it possible to turn this into a functor and reuse it in the code of
Done


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h
File src/kudu/tablet/txn_participant.h:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@115
PS9, Line 115: It is thus expected that repeat callers are taking
             :   // the same lock
> nit: is it possible to check for this invariant in the implementation of th
Done


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@117
PS9, Line 117: AcquirePartitionLock
> nit: would AdoptPartitionLock() be a better name for this method?
Done


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc
File src/kudu/tablet/txn_participant.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc@78
PS9, Line 78:   TabletServerErrorPB::Code code = tserver::TabletServerErrorPB::UNKNOWN_ERROR;
> nit: maybe, move this under the 'if ()' clause below?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:03:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 9: Code-Review+2

(9 comments)

Overall looks good!

Just a few nits.

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc
File src/kudu/integration-tests/txn_write_ops-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/integration-tests/txn_write_ops-itest.cc@352
PS9, Line 352: error.IsTimedOut()
nit: could you add a comment explaining why both IsAborted() and IsTimedOut() are possible result codes here?


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h
File src/kudu/tablet/ops/write_op.h:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.h@179
PS9, Line 179: locks
nit: lock?


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc
File src/kudu/tablet/ops/write_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@217
PS9, Line 217: rows
nit: row


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574
PS9, Line 574:     completion_callback()->set_error(s, code);
             :     return s;
nit: does it make sense to add CHECK(!s.ok()) before setting the error code and returning?


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h@172
PS5, Line 172:   // Acquire a shared lock on the given transaction, to ensure the
             :   // transaction's state doesn't change while the given write is in flight.
             :   Status AcquireTxnLock(int64_t txn_id, WriteOpState* op_state);
> IIRC from my discussion with Hao, doing this in the BEGIN_TXN lets us catch
Thank you very much for going an extra mile!  I think it's crucial to shrink the lock time to have better throughput for concurrent operations.


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc
File src/kudu/tablet/txn_participant-test.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant-test.cc@525
PS9, Line 525:     Status s = Write(0);
             :     ASSERT_TRUE(s.IsAborted()) << s.ToString();
             :     ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another");
             : 
             :     s = Write(0, kTxnTwo);
             :     ASSERT_TRUE(s.IsAborted()) << s.ToString();
             :     ASSERT_STR_CONTAINS(s.ToString(), "partition lock that is held by another");
nit: is it possible to turn this into a functor and reuse it in the code of this scenario?


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h
File src/kudu/tablet/txn_participant.h:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@115
PS9, Line 115: It is thus expected that repeat callers are taking
             :   // the same lock
nit: is it possible to check for this invariant in the implementation of this method?


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@117
PS9, Line 117: AcquirePartitionLock
nit: would AdoptPartitionLock() be a better name for this method?


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc
File src/kudu/tablet/txn_participant.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.cc@78
PS9, Line 78:   TabletServerErrorPB::Code code = tserver::TabletServerErrorPB::UNKNOWN_ERROR;
nit: maybe, move this under the 'if ()' clause below?



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 9
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Apr 2021 05:59:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] (wip) KUDU-2612: acquire and release PartitionLock

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17159

to look at the new patch set (#2).

Change subject: (wip) KUDU-2612: acquire and release PartitionLock
......................................................................

(wip) KUDU-2612: acquire and release PartitionLock

This patch introduces the logic for acquiring and releasing the
PartitionLock at ParticipantOp for BEGIN_TXN, FINALIZE_COMMIT
and ABORT_TXN.

wip: move code for ScopedPartitionLock to the previous patch,
     add acquire and release PartitionLock at WriteOp,
     update TxnOpDispatcher error handling,
     add tests.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/tablet/lock_manager.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.h
8 files changed, 138 insertions(+), 8 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/2
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 2
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has uploaded a new patch set (#7) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into write ops for transactional and
non-transactional operations. Upon attempting to write, we try to
acquire the partition lock in the WriteOp prepare phase, and upon
successfully applying the op, we transfer ownership of the lock to the
Txn that the write was a part of (or just release the partition lock if
the write was non-transactional). Txns release the lock when
FINALIZE_COMMIT or ABORT_TXN is applied.

We take the partition lock for non-transactional write ops as well to
ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and
then a non-transactional write inserts key=1 before the transaction
commits). If the partition lock cannot be acquired, the write op is
aborted or retried, based on the wait-die mechanics already in the lock
manager.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
16 files changed, 593 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/7
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 7
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] (wip) KUDU-2612: acquire and release PartitionLock

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: (wip) KUDU-2612: acquire and release PartitionLock
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc
File src/kudu/tablet/lock_manager.cc:

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/lock_manager.cc@401
PS1, Line 401: void ScopedPartitionLock::TakeState(ScopedPartitionLock* other) {
For consistency, maybe add a check that other != this ?


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@300
PS1, Line 300:   state_->ReleaseTxn();
As for the sequence of releasing txn and its lock and partition lock: should the sequence of ReleaseTxn() and ReleasePartitionLock() be reversed?  It seems these "locks" are acquired in the order of (1) txn lock (2) partition lock, so I'd expect those to be released in different order.  Would it make sense?


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@303
PS1, Line 303: const auto& op = state_->request()->op();
nit: could store the type of the operation instead since the operation as is isn't needed below?



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 1
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 09 Mar 2021 22:32:47 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has uploaded a new patch set (#5) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into participant and write ops for
transactional and non-transactional operations. For each transaction, we
try to acquire the partition lock in the ParticipantOp prepare phase of
BEGIN_TXN and release the lock when COMMIT_TXN or ABORT_TXN is applied.

Moreover, we take the partition lock for non-transactional write ops as
well to ensure we don’t have duplicate keys. If the partition lock
cannot be acquired, the transaction (or write op) is aborted or retried.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
16 files changed, 471 insertions(+), 19 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/5
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 5
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has uploaded a new patch set (#11) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into write ops for transactional and
non-transactional operations. Upon attempting to write, we try to
acquire the partition lock in the WriteOp prepare phase, and upon
successfully applying the op, we transfer ownership of the lock to the
Txn that the write was a part of (or just release the partition lock if
the write was non-transactional). Txns release the lock when
FINALIZE_COMMIT or ABORT_TXN is applied.

We take the partition lock for non-transactional write ops as well to
ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and
then a non-transactional write inserts key=1 before the transaction
commits). If the partition lock cannot be acquired, the write op is
aborted or retried, based on the wait-die mechanics already in the lock
manager.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
16 files changed, 600 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/11
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 11
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into write ops for transactional and
non-transactional operations. Upon attempting to write, we try to
acquire the partition lock in the WriteOp prepare phase, and upon
successfully applying the op, we transfer ownership of the lock to the
Txn that the write was a part of (or just release the partition lock if
the write was non-transactional). Txns release the lock when
FINALIZE_COMMIT or ABORT_TXN is applied.

We take the partition lock for non-transactional write ops as well to
ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and
then a non-transactional write inserts key=1 before the transaction
commits). If the partition lock cannot be acquired, the write op is
aborted or retried, based on the wait-die mechanics already in the lock
manager.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Reviewed-on: http://gerrit.cloudera.org:8080/17159
Tested-by: Andrew Wong <aw...@cloudera.com>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
16 files changed, 600 insertions(+), 16 deletions(-)

Approvals:
  Andrew Wong: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 12
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has uploaded a new patch set (#10) to the change originally created by Hao Hao. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch plumbs partition locks into write ops for transactional and
non-transactional operations. Upon attempting to write, we try to
acquire the partition lock in the WriteOp prepare phase, and upon
successfully applying the op, we transfer ownership of the lock to the
Txn that the write was a part of (or just release the partition lock if
the write was non-transactional). Txns release the lock when
FINALIZE_COMMIT or ABORT_TXN is applied.

We take the partition lock for non-transactional write ops as well to
ensure we don’t duplicate keys (e.g. if a transaction inserts key=1 and
then a non-transactional write inserts key=1 before the transaction
commits). If the partition lock cannot be acquired, the write op is
aborted or retried, based on the wait-die mechanics already in the lock
manager.

A flag is also introduced to disable this locking for tests that
currently expect support for concurrent transactions.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/lock_manager.h
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_bootstrap.cc
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
16 files changed, 600 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/10
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 10
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc
File src/kudu/integration-tests/txn_participant-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@813
PS3, Line 813: }
> Done
This is marked 'Done' but I don't see that a scenario with calling ParticipateInTransaction(..., ParticipantOpPB::BEGIN_TXN) for the same txn_id multiple times is added.  Did I miss anything?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_participant-itest.cc@930
PS3, Line 930: 
> Done
nit: this is marked as 'Done', and I saw TODO is added in other places.  Is it worth adding that comment here as well?


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc
File src/kudu/integration-tests/txn_participant-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/integration-tests/txn_participant-itest.cc@282
PS5, Line 282:         const auto& actual_txns = replicas[i]->tablet()->txn_participant()->GetTxnsForTests();
             :         ASSERT_EQ(txns.size(), actual_txns.size());
             :         ASSERT_EQ(txns, actual_txns);
nit: is this change essential?  If not, maybe it's better to return back to the version how it was in PS3?  I guess with this version as in PS5, if the assertion ever triggers (even with ASSERT_EVENTUALLY), there isn't as much useful information anymore because the assertion on the size strikes first.


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/ops/participant_op.cc@212
PS5, Line 212: RETURN_NOT_OK(replica->tablet()->AcquirePartitionLock(state_.get(), wait_mode));
How to release the partition lock if the following code kicks in when the driver run Status ParticipantOp::Prepare():

https://github.com/apache/kudu/blob/1eb2a2fbca329721ec7152714b7c95374754044b/src/kudu/tablet/ops/op_driver.cc#L298-L305

Basically, how to guarantee the lock is to be released in all possible outcomes in the OpDriver's code once the lock is taken?


http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h
File src/kudu/tablet/tablet.h:

http://gerrit.cloudera.org:8080/#/c/17159/5/src/kudu/tablet/tablet.h@172
PS5, Line 172:   // Similar to the above but for participant op.
             :   Status AcquirePartitionLock(ParticipantOpState* op_state,
             :                               LockManager::LockWaitMode wait_mode);
BTW, why do we need this method when we already have a similar one for WriteOpState?  Would it be possible to acquire partition lock only when performing a write operation?  If not, could you please add a blurb explaining why it's also necessary to acquire a lock while working with ParticipantOpState as well?



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 5
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Fri, 26 Mar 2021 03:34:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hao Hao has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17159/1//COMMIT_MSG@13
PS1, Line 13: COMMI
> We'll also need to make sure the TxnOpDispatcher handles the error cases we
Done


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@95
PS1, Line 95: l acquired = partition_lock_.IsAcquired(&code);
> nit: based on the current code, this cannot be any other code, right? If so
Done


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@199
PS1, Line 199:  "ParticipantOp::Prepare
> nit: can use 'replica'
Done


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@200
PS1, Line 200: TRACE("PREPARE: Starting.");
> Shouldn't there be a 'break' before this line? Or are we taking the partiti
Ah, right, 'break' is needed. Thanks for catching it!


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@300
PS1, Line 300:       return Status::InvalidArgument("unknown op type");
> As for the sequence of releasing txn and its lock and partition lock: shoul
Yes, it makes sense. Updated it.


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/ops/participant_op.cc@303
PS1, Line 303: return Status::OK();
> nit: could store the type of the operation instead since the operation as i
Will do.


http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/17159/1/src/kudu/tablet/tablet.cc@223
PS1, Line 223: using std::endl;
> warning: using decl 'TabletServerErrorPB' is unused [misc-unused-using-decl
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 11 Mar 2021 18:57:01 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Hao Hao (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/17159

to look at the new patch set (#4).

Change subject: KUDU-2612: acquire and release partition lock
......................................................................

KUDU-2612: acquire and release partition lock

This patch introduces the logic for actual acquiring and releasing
the partition lock for transactions and non-transactional operations.
For each transaction, we try to acquire the partition lock in the
ParticipantOp prepare phase of BEGIN_TXN and release the lock when
COMMIT_TXN or ABORT_TXN is applied. Moreover, we take the parition
lock for non-transactional operations as well to ensure we don’t
have duplicate keys. If the partition lock cannot be acquired,
the transaction (or write op) will be aborted or retried.

Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
---
M src/kudu/client/batcher.cc
M src/kudu/integration-tests/fuzz-itest.cc
M src/kudu/integration-tests/txn_commit-itest.cc
M src/kudu/integration-tests/txn_participant-itest.cc
M src/kudu/integration-tests/txn_write_ops-itest.cc
M src/kudu/tablet/ops/participant_op.cc
M src/kudu/tablet/ops/participant_op.h
M src/kudu/tablet/ops/write_op.cc
M src/kudu/tablet/ops/write_op.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/txn_participant-test.cc
M src/kudu/tablet/txn_participant.h
M src/kudu/transactions/participant_rpc.cc
14 files changed, 465 insertions(+), 14 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/17159/4
-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 4
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 3:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc
File src/kudu/client/batcher.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/client/batcher.cc@530
PS3, Line 530:     //                Status::Abort() originated from TabletServerErrorPB::TXN_LOCKED_ABORT
             :     //                response are needlessly retried.
             : 
Isn't this handled above? Or do you mean TXN_LOCKED_RETRY? We'd probably be better off if we handled TXN_LOCKED_RETRY_OP explicitly and treated it as SERVICE_UNAVAILABLE or somesuch.


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc
File src/kudu/integration-tests/txn_write_ops-itest.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/integration-tests/txn_write_ops-itest.cc@1046
PS3, Line 1046: }
nit: for completeness sake, maybe commit `second_txn` and ensure we no longer have dispatchers? Same below?


http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.h
File src/kudu/tablet/ops/participant_op.h:

http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.h@75
PS2, Line 75:  lock_manag
nit: the lock wasn't acquired by this op, it was acquired by txn_.


http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.cc
File src/kudu/tablet/ops/participant_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/2/src/kudu/tablet/ops/participant_op.cc@108
PS2, Line 108: else {
nit: use LOG(FATAL) or LOG(DFATAL).


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc
File src/kudu/tablet/ops/write_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@67
PS3, Line 67: DEFINE_bool(enable_write_op_partition_lock, true,
            :             "Whether or not to enable partition lock for write op");
            : TAG_FLAG(enable_write_op_partition_lock, advanced);
            : TAG_FLAG(enable_write_op_partition_lock, experimental);
Could we just reuse --enable_txn_partition_lock? When might we want to set them to different values?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/ops/write_op.cc@569
PS3, Line 569:       DCHECK(false) << "unexpected error code " << code;
LOG(DFATAL) or LOG(FATAL)?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h
File src/kudu/tablet/txn_participant.h:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/tablet/txn_participant.h@220
PS3, Line 220:     partition_lock_ = std::move(partition_lock);
nit: maybe DCHECK or CHECK that we own the lock?


http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc
File src/kudu/transactions/participant_rpc.cc:

http://gerrit.cloudera.org:8080/#/c/17159/3/src/kudu/transactions/participant_rpc.cc@180
PS3, Line 180:       case TabletServerErrorPB::TXN_LOCKED_ABORT:
Can we also explicitly handle TXN_LOCKED_RETRY_OP?



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 3
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Thu, 11 Mar 2021 20:58:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2612: acquire and release partition lock

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/17159 )

Change subject: KUDU-2612: acquire and release partition lock
......................................................................


Patch Set 10: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc
File src/kudu/tablet/ops/write_op.cc:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/ops/write_op.cc@574
PS9, Line 574:     DCHECK(!s.ok()) << s.ToString();
             :     completio
> Done as a DCHECK
I guess DCHECK() wouldn't trigger in case of s.ok() because there is LOG(DFATAL) a couple of lines above, but if you are sure there aren't different code paths in release vs debug mode, it's totally fine with me.


http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h
File src/kudu/tablet/txn_participant.h:

http://gerrit.cloudera.org:8080/#/c/17159/9/src/kudu/tablet/txn_participant.h@115
PS9, Line 115: It is thus expected that repeat callers are taking
             :   // the same lock
> Done
Thank you!



-- 
To view, visit http://gerrit.cloudera.org:8080/17159
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If26733cae16810f3b3afd1fd05dcb984e6366939
Gerrit-Change-Number: 17159
Gerrit-PatchSet: 10
Gerrit-Owner: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Hao Hao <ha...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Apr 2021 23:20:00 +0000
Gerrit-HasComments: Yes