You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Mike Percy (Code Review)" <ge...@cloudera.org> on 2018/02/07 20:10:59 UTC

[kudu-CR] KUDU-2274. consensus: Remove ConsensusMetadata thread safety

Hello Alexey Serbin,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/9246

to review the following change.


Change subject: KUDU-2274. consensus: Remove ConsensusMetadata thread safety
......................................................................

KUDU-2274. consensus: Remove ConsensusMetadata thread safety

ConsensusMetadata doesn't need to be thread-safe, even though it is
ref-counted, because it is required to be externally synchronized.
This patch replaces the mutex with a DFAKE_MUTEX from the thread
collision warner utility class in order to easily detect concurrent
access due to buggy external sychronization.

This class could be restructured to remove the distinction between
"locked" and "unlocked" methods by using the recursive version of the
thread collision warner in every public method. Perhaps we should do
that instead.

Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
2 files changed, 28 insertions(+), 40 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/9246/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 1
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>

[kudu-CR] KUDU-2274. Shut down tombstoned replica when replacing it

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9246 )

Change subject: KUDU-2274. Shut down tombstoned replica when replacing it
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 4
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Sat, 10 Feb 2018 02:32:40 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2274. Shut down tombstoned replica when replacing it

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9246

to look at the new patch set (#2).

Change subject: KUDU-2274. Shut down tombstoned replica when replacing it
......................................................................

KUDU-2274. Shut down tombstoned replica when replacing it

Failing to shut down a tombstoned replica after copying it can lead to
unfortunate interleavings resulting in the replica ending up in an
inconsistent state. This actually occurred in a test environment,
although it proved very hard to reproduce.

This patch includes several changes in addition to shutting down
tombstoned replicas before replacing them:

* Remove the thread safety properties of the ConsensusMetadata class

  ConsensusMetadata doesn't need to be thread-safe, even though it is
  ref-counted, because it is required to be externally synchronized.
  This patch replaces the mutex with a DFAKE_MUTEX from the thread
  collision warner utility class in order to easily detect concurrent
  access due to buggy external sychronization.

* Also improve destructor state checks in TabletReplica.

* Fix another case of unlocked cmeta access by TSTabletManager.

These fixes were verified by running tombstoned_voting-stress-test with
4 CPU stress threads on the dist-test cluster after applying only the
ConsensusMetadata thread-safety portion of this patch, and then again
with the unlocked access fix and shutdown portions of this patch.

After removing the cmeta mutex only (186/200 failed):
http://dist-test.cloudera.org/job?job_id=mpercy.1518077234.135005

This full patch (200/200 succeeded):
http://dist-test.cloudera.org/job?job_id=mpercy.1518078690.66599

Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 87 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/46/9246/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 2
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-2274. consensus: Remove ConsensusMetadata thread safety

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9246 )

Change subject: KUDU-2274. consensus: Remove ConsensusMetadata thread safety
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h
File src/kudu/consensus/consensus_meta.h:

PS1: 
> Yep, overall it LGTM.  I think it's a movement in the right direction.  By 
Agreed and done



-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 1
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Thu, 08 Feb 2018 08:48:44 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2274. Shut down tombstoned replica when replacing it

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9246 )

Change subject: KUDU-2274. Shut down tombstoned replica when replacing it
......................................................................

KUDU-2274. Shut down tombstoned replica when replacing it

Failing to shut down a tombstoned replica after copying it can lead to
unfortunate interleavings resulting in the replica ending up in an
inconsistent state. This actually occurred in a test environment,
although it proved very hard to reproduce.

This patch includes several changes in addition to shutting down
tombstoned replicas before replacing them:

* Remove the thread safety properties of the ConsensusMetadata class

  ConsensusMetadata doesn't need to be thread-safe, even though it is
  ref-counted, because it is required to be externally synchronized.
  This patch replaces the mutex with a DFAKE_MUTEX from the thread
  collision warner utility class in order to easily detect concurrent
  access due to buggy external sychronization.

* Also improve destructor state checks in TabletReplica.

* Fix another case of unlocked cmeta access by TSTabletManager.

These fixes were verified by running tombstoned_voting-stress-test with
4 CPU stress threads on the dist-test cluster after applying only the
ConsensusMetadata thread-safety portion of this patch, and then again
with the unlocked access fix and shutdown portions of this patch.

After removing the cmeta mutex only (186/200 failed):
http://dist-test.cloudera.org/job?job_id=mpercy.1518077234.135005

This full patch (200/200 succeeded):
http://dist-test.cloudera.org/job?job_id=mpercy.1518078690.66599

Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Reviewed-on: http://gerrit.cloudera.org:8080/9246
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/consensus/consensus_meta.cc
M src/kudu/consensus/consensus_meta.h
M src/kudu/consensus/raft_consensus.cc
M src/kudu/consensus/raft_consensus.h
M src/kudu/tablet/tablet_replica.cc
M src/kudu/tserver/ts_tablet_manager.cc
6 files changed, 87 insertions(+), 155 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 5
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-2274. consensus: Remove ConsensusMetadata thread safety

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9246 )

Change subject: KUDU-2274. consensus: Remove ConsensusMetadata thread safety
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h
File src/kudu/consensus/consensus_meta.h:

PS1: 
> Is this just a WIP patch?
Well, semi WIP, I wanted your opinion but I tend to agree that we should restructure the class and use a recursive fake mutex (per the commit message).


http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h@236
PS1, Line 236: TODO(mpercy) Doc this
> Indeed: it would be nice to explain the reasoning behind having this fake m
oops, will do


http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h@237
PS1, Line 237: lock_
> nit: maybe, rename it to fake_lock_?
good idea



-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 1
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 07 Feb 2018 23:04:00 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2274. consensus: Remove ConsensusMetadata thread safety

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9246 )

Change subject: KUDU-2274. consensus: Remove ConsensusMetadata thread safety
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h
File src/kudu/consensus/consensus_meta.h:

PS1: 
> Well, semi WIP, I wanted your opinion but I tend to agree that we should re
Yep, overall it LGTM.  I think it's a movement in the right direction.  By my opinion, once converted into non-thread safe with fake mutex for additional consistency, this class needs a bit of restructuring and cleanup.



-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 1
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Wed, 07 Feb 2018 23:12:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2274. consensus: Remove ConsensusMetadata thread safety

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9246 )

Change subject: KUDU-2274. consensus: Remove ConsensusMetadata thread safety
......................................................................


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h
File src/kudu/consensus/consensus_meta.h:

PS1: 
Is this just a WIP patch?

There are multiple methods in this file called xxx_unlocked() and XxxUnlocked().  By my understanding, if it's not a WIP patch, it would be nice to get rid of names asserting on thread-safe behavior of this class, removing variations of the unlocked methods.

If I'm not mistaken, it's possible to embed methods protected by scoped fake lock.


http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h@236
PS1, Line 236: TODO(mpercy) Doc this
Indeed: it would be nice to explain the reasoning behind having this fake mutex.


http://gerrit.cloudera.org:8080/#/c/9246/1/src/kudu/consensus/consensus_meta.h@237
PS1, Line 237: lock_
nit: maybe, rename it to fake_lock_?



-- 
To view, visit http://gerrit.cloudera.org:8080/9246
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia8d086c3fba52826ebe0d3a44842d53ecb6a9265
Gerrit-Change-Number: 9246
Gerrit-PatchSet: 1
Gerrit-Owner: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Comment-Date: Wed, 07 Feb 2018 20:42:45 +0000
Gerrit-HasComments: Yes