You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "David Ribeiro Alves (Code Review)" <ge...@cloudera.org> on 2018/03/14 21:40:07 UTC

[kudu-CR] Simplify OpId/Timestamp assignment and make it atomic

David Ribeiro Alves has posted comments on this change. ( http://gerrit.cloudera.org:8080/7221 )

Change subject: Simplify OpId/Timestamp assignment and make it atomic
......................................................................


Patch Set 8:

(36 comments)

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h
File src/kudu/consensus/consensus-test-util.h:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@64
PS17, Line 64:   OpId* id = msg->mutable_id();
             :   id->set_term(term);
> nit: usually it's not a good idea to have 'using' in header files, unless t
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@67
PS17, Line 67: 
> As I see, the pattern is always CreateDummyReplicateWithOpId().release().
I essentially split CreateDummyReplicate in 2 versions one that sets an opid and one that doesn't. I can refactor the pointer ownership in a follow up if you feel strongly, but would like to keep that out of this straightforward change.


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@82
PS17, Line 82: inline RaftPeerPB FakeRaftPeerPB(const std::string& uuid) {
> ditto: maybe, just return a raw pointer.
same


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@115
PS17, Line 115:         continue;
> Instead, is it possible to rely on the mode of the object pointed to by the
not sure I follow...


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus-test-util.h@133
PS17, Line 133: _
> an extra space
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_meta.cc
File src/kudu/consensus/consensus_meta.cc:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_meta.cc@153
PS17, Line 153: return leader_uuid_;
> Maybe, just use DCHECK() here?  As I understand, the committed configuratio
tbh honest I don't see a problem in having this here and being extra defensive. the point of this patch is that the pending config is no longer assigned an op_id before being marked as pendign which must be set before marking it as commited. this is here to make sure we always enforce that last bit.
added a comment to that regard.


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.h
File src/kudu/consensus/consensus_queue.h:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.h@231
PS17, Line 231:   // it is alive and making progress.
> update this to indicate it also does assignment of OpId and Timestamp
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.h@238
PS17, Line 238:                         bool* more_pending);
              : 
              :   // Called by the consensus implementation to update the queu
> Is it possible just to rely on the status of the queue and have a single me
the latter. I pondered the "uglyness" of having two methods/extra params versus just relying on the state of the queue, but ultimately decided on this to make it clear/explicit and have extra checks.


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc
File src/kudu/consensus/consensus_queue.cc:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc@361
PS17, Line 361:   // Maintain a thread-safe copy of necessary members.
              :   OpId preceding_id;
> per comment below about GetNextOpId having a side effect, here's an example
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc@421
PS17, Line 421:     // We try to get the follower's next_index from our log.
> nit: indent
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/consensus_queue.cc@440
PS17, Line 440:                                         << "while preparing peer request: "
              :                                         << s.ToString() << ". Destination peer: "
              :    
> hrm, this seems a little surprising that GetNextOpId has this side effect r
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/quorum_util.cc
File src/kudu/consensus/quorum_util.cc:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/quorum_util.cc@158
PS17, Line 158: 
> add: DCHECK(type == COMMITTED_CONFIG || type == PENDING_CONFIG)
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.h
File src/kudu/consensus/raft_consensus.h:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.h@449
PS17, Line 449: t, 
> must
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc
File src/kudu/consensus/raft_consensus.cc:

PS17: 
> Does it make sense to add some specific test for this new behavior or it's 
for writes the behavior is undistinguishable. for non-writes the behavior is now "correct". there's a patch further down the series that makes sure this is actually what happens.


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@648
PS17, Line 648: 
> or just: if (round->replicate_msg()->op_type() == CHANGE_CONFIG_OP) { ... }
went with todds suggestion


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@648
PS17, Line 648: 
> although this is right, it looks confusing because round probably isn't a c
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@662
PS17, Line 662:   // Check if the pending Raft config has an OpId and whether the index is less than the
> Due to the method name, seems less surprising to add: DCHECK_EQ(CHANGE_CONF
went with todds naming suggestion


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@673
PS17, Line 673:     return Status::OK();
> nit: extra line
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@691
PS17, Line 691: h
> nit: no need for inner parentheses
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2609
PS17, Line 2609: ClearLeaderU
> this and new_config can be const refs
new_config can't we set the opid below before we set it as the committed config


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2622
PS17, Line 2622: 
> This could be const reference as well.
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2623
PS17, Line 2623: bool RaftConsensus::HasLeaderUnlocked() const {
> nit: add a space after 'and'
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2624
PS17, Line 2624: 
> nit: replace with colon?
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2633
PS17, Line 2633: const bool RaftConsensus::HasVotedCurrentTermUnlocked() const {
               :   DCHECK(lock_.is_locked());
> it should still have an opid right? ie if we get to this path it should hav
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/raft_consensus.cc@2844
PS17, Line 2844: 
               : 
               : 
> Since this is just for comparing configs in the if() closure below, move it
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager-test.cc
File src/kudu/consensus/time_manager-test.cc:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager-test.cc@a176
PS17, Line 176: 
> Maybe, add an assertion instead:
this is testing pinning safetime advancement, which no longer happens


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc
File src/kudu/consensus/time_manager.cc:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc@110
PS17, Line 110:   //
              :   //       The only opportunity for unrepeatable reads in this setup is if an old leader
> hrm, is this method still even very useful? Maybe it would be clearer to ju
would prefer to keep this, even if now it's just really checking the stage. as the docs say it's necessary for leader leases, also consensus doesn't directly interact with the clock anymore.


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc@294
PS17, Line 294: 
              : 
> this looks wrong. you are falling through to the next case which breaks any
you're right that this looked confusing. fixed that. the point is to either get the current clock value or the last safe ts. hopefully it's clear now.


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/consensus/time_manager.cc@307
PS17, Line 307: 
> can you DCHECK_EQ(mode_, LEADER) here?
Done


http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc
File src/kudu/tablet/transactions/transaction_driver.cc:

http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc@267
PS8, Line 267:       prepare_state_ = PREPARED;
> it's sketching me out that the state transition to PREPARED happens under t
Left a comment related to that in LOC 330. The point is that the transaction is now started after being submitted to consensus, and consensus replication and its callback will race with this thread, so we need to make sure we've started the txn before marking it PREPARED signaling it's ready to move to apply.
Done.


http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc@295
PS8, Line 295:       // Register a dummy transaction with mvcc manager that we'll abort later.
> Did you see this comment and the one above?
Done


http://gerrit.cloudera.org:8080/#/c/7221/8/src/kudu/tablet/transactions/transaction_driver.cc@295
PS8, Line 295:       // Register a dummy transaction with mvcc manager that we'll abort later.
> instead of using a dummy transaction, would it be possible to register it e
hum, that might work, but for us to have different timestamps for the same txn we'd likely have to add a timestamp state or mode so that we don't believe the timestamp elsewhere if its not definitive.

I'll just bite the bullet and add a pinning mechanism to mvcc (this was a lazy way of doing that).

Update: Added a scoped clean time advancement pin mechanism.


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc
File src/kudu/tablet/transactions/transaction_driver.cc:

http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@336
PS17, Line 336: 
> s/rule/invariant/
Done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@360
PS17, Line 360: 
> s/constraint/above invariant/
Done
mvcc can't know of all transactions, but if does crash if we try to either submit a transaction with a ts lower than clean time or if we commit a transaction with a ts lower than clean time


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@371
PS17, Line 371: $0)", s.ToStrin
> This variable masks a variable of the same name in an outer scope
ah, good catch. done


http://gerrit.cloudera.org:8080/#/c/7221/17/src/kudu/tablet/transactions/transaction_driver.cc@384
PS17, Line 384: {
> can we still get to this case?
I don't see why not



-- 
To view, visit http://gerrit.cloudera.org:8080/7221
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2bedcc5c3708d06f48131f34ea14c835867b4800
Gerrit-Change-Number: 7221
Gerrit-PatchSet: 8
Gerrit-Owner: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: David Ribeiro Alves <da...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 14 Mar 2018 21:40:07 +0000
Gerrit-HasComments: Yes