You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2016/06/04 00:16:51 UTC

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Hello Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/3309

to review the following change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................

KUDU-1473: fix some tablet lock usage in CatalogManager

This was probably due to the refactoring done in commit 59ff89d. Now that
we're releasing tablet write locks as early as possible, we need to
reacquire them (in read mode) for some operations.

It's still a mystery why the Java tests sometimes trigger the locking
violations and the C++ tests don't. Per the backtrace in the bug report, any
GetTableLocations() call to a table whose tablets have been assigned but are
otherwise still under construction may trigger a violation, as
GetTableLocations() acquires a tablet lock in read mode.

Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
2 files changed, 41 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/3309/1
-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 7:

Build Started http://104.196.14.100/job/kudu-gerrit/1799/

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 6:

Build Started http://104.196.14.100/job/kudu-gerrit/1784/

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3309/2//COMMIT_MSG
Commit Message:

Line 15: GetTableLocations() call to a table whose tablets have been assigned but are
I'm not entirely following what GetTableLocations has to do with this bug - isn't the crash in ProcessPendingAssignments?


-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 5: Verified+1

Overriding Jenkins, the one failure was a Java test (testDisconnect) in TSAN mode. Dan has been working on that one.

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has uploaded a new patch set (#6).

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................

KUDU-1473: fix some tablet lock usage in CatalogManager

This was probably due to the refactoring done in commit 59ff89d. Now that
we're releasing tablet write locks as early as possible, we need to
reacquire them (in read mode) for some operations.

The lock misuse reared its head most often in the Java test
TestKuduTable.testGetLocations, but could trigger in just about any test
that didn't wait for table creation to finish before accessing the table.
For example, CreateTableITest.TestCreateWhenMajorityOfReplicasFailCreation
was a little flaky too. The backtrace was always the same:

cow_object.h:82] Check failed: lock_.HasReaders() || lock_.HasWriteLock()
    @     0x7fecb643e37b  kudu::CowObject<>::state() at ??:0
    @     0x7fecb6422fe9  kudu::master::CatalogManager::ProcessPendingAssignments() at ??:0
    @     0x7fecb6421d3a  kudu::master::CatalogManagerBgTasks::Run() at ??:0
    @     0x7fecb645c1f7  boost::_mfi::mf0<>::operator()() at ??:0
    @     0x7fecb645c15b  boost::_bi::list1<>::operator()<>() at ??:0
    @     0x7fecb645c104  boost::_bi::bind_t<>::operator()() at ??:0
    @     0x7fecb645bf2a  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
    @     0x7fecb0b66d82  boost::function0<>::operator()() at ??:0
    @     0x7fecafba96c0  kudu::Thread::SuperviseThread() at ??:0
    @           0x423faa  __tsan_thread_start_func at ??:0
    @     0x7fecb28459d1  start_thread at ??:0
    @     0x7fecacd108fd  clone at ??:0

Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
2 files changed, 41 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/3309/6
-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 6: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Jean-Daniel Cryans,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3309

to look at the new patch set (#7).

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................

KUDU-1473: fix some tablet lock usage in CatalogManager

This was probably due to the refactoring done in commit 59ff89d. Now that
we're releasing tablet write locks as early as possible, we need to
reacquire them (in read mode) for some operations.

The lock misuse reared its head most often in the Java test
TestKuduTable.testGetLocations, but could trigger in just about any test
that didn't wait for table creation to finish before accessing the table.
For example, CreateTableITest.TestCreateWhenMajorityOfReplicasFailCreation
was a little flaky too. The backtrace was always the same:

cow_object.h:82] Check failed: lock_.HasReaders() || lock_.HasWriteLock()
    @     0x7fecb643e37b  kudu::CowObject<>::state() at ??:0
    @     0x7fecb6422fe9  kudu::master::CatalogManager::ProcessPendingAssignments() at ??:0
    @     0x7fecb6421d3a  kudu::master::CatalogManagerBgTasks::Run() at ??:0
    @     0x7fecb645c1f7  boost::_mfi::mf0<>::operator()() at ??:0
    @     0x7fecb645c15b  boost::_bi::list1<>::operator()<>() at ??:0
    @     0x7fecb645c104  boost::_bi::bind_t<>::operator()() at ??:0
    @     0x7fecb645bf2a  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
    @     0x7fecb0b66d82  boost::function0<>::operator()() at ??:0
    @     0x7fecafba96c0  kudu::Thread::SuperviseThread() at ??:0
    @           0x423faa  __tsan_thread_start_func at ??:0
    @     0x7fecb28459d1  start_thread at ??:0
    @     0x7fecacd108fd  clone at ??:0

Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
2 files changed, 41 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/3309/7
-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


KUDU-1473: fix some tablet lock usage in CatalogManager

This was probably due to the refactoring done in commit 59ff89d. Now that
we're releasing tablet write locks as early as possible, we need to
reacquire them (in read mode) for some operations.

The lock misuse reared its head most often in the Java test
TestKuduTable.testGetLocations, but could trigger in just about any test
that didn't wait for table creation to finish before accessing the table.
For example, CreateTableITest.TestCreateWhenMajorityOfReplicasFailCreation
was a little flaky too. The backtrace was always the same:

cow_object.h:82] Check failed: lock_.HasReaders() || lock_.HasWriteLock()
    @     0x7fecb643e37b  kudu::CowObject<>::state() at ??:0
    @     0x7fecb6422fe9  kudu::master::CatalogManager::ProcessPendingAssignments() at ??:0
    @     0x7fecb6421d3a  kudu::master::CatalogManagerBgTasks::Run() at ??:0
    @     0x7fecb645c1f7  boost::_mfi::mf0<>::operator()() at ??:0
    @     0x7fecb645c15b  boost::_bi::list1<>::operator()<>() at ??:0
    @     0x7fecb645c104  boost::_bi::bind_t<>::operator()() at ??:0
    @     0x7fecb645bf2a  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
    @     0x7fecb0b66d82  boost::function0<>::operator()() at ??:0
    @     0x7fecafba96c0  kudu::Thread::SuperviseThread() at ??:0
    @           0x423faa  __tsan_thread_start_func at ??:0
    @     0x7fecb28459d1  start_thread at ??:0
    @     0x7fecacd108fd  clone at ??:0

Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Reviewed-on: http://gerrit.cloudera.org:8080/3309
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Adar Dembo <ad...@cloudera.com>
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
2 files changed, 41 insertions(+), 26 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Adar Dembo: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 8
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3309/5//COMMIT_MSG
Commit Message:

PS5, Line 13: It's still a mystery why the Java tests
> Can't get a regression test in C++?
Ah. I had changed the commit description in PS4 (via gerrit) then blew it away when I pushed PS5.

Short answer: at least one C++ test was flaky due to this, and I thought that was good enough as far as regression test coverage is concerned.


-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 7: Verified+1

Overriding Jenkins, another Java testDisconnect failure.

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 6:

(1 comment)

> (1 comment)
 > 
 > I wish there was an easier way to hide the data object so that we
 > can only access it through the lock object. It's too easy to make
 > the mistake what this patch is fixing when using COWLocks.

I think the complexity arises because sometimes we also want to lock objects by calling StartMutation()/AbortMutation()/CommitMutation() directly, after which we'd use one of the state accessors to make changes. It's usually done in places where lock objects are awkward to work with.

http://gerrit.cloudera.org:8080/#/c/3309/6/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

Line 525:   // Send the "create tablet request" to a particular peer.
> Comment needs updating -- it's for all peers of a particular tablet
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 1:

Build Started http://104.196.14.100/job/kudu-gerrit/1745/

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 3:

Build Started http://104.196.14.100/job/kudu-gerrit/1761/

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3309/5//COMMIT_MSG
Commit Message:

PS5, Line 13: It's still a mystery why the Java tests
Can't get a regression test in C++?


-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 2:

Build Started http://104.196.14.100/job/kudu-gerrit/1753/

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3309

to look at the new patch set (#4).

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................

KUDU-1473: fix some tablet lock usage in CatalogManager

This was probably due to the refactoring done in commit 59ff89d. Now that
we're releasing tablet write locks as early as possible, we need to
reacquire them (in read mode) for some operations.

The lock misuse reared its head most often in the Java test
TestKuduTable.testGetLocations, but could trigger in just about any test
that didn't wait for table creation to finish before accessing the table.
For example, CreateTableITest.TestCreateWhenMajorityOfReplicasFailCreation
was a little flaky too. The backtrace was always the same:

cow_object.h:82] Check failed: lock_.HasReaders() || lock_.HasWriteLock() 
    @     0x7fecb643e37b  kudu::CowObject<>::state() at ??:0
    @     0x7fecb6422fe9  kudu::master::CatalogManager::ProcessPendingAssignments() at ??:0
    @     0x7fecb6421d3a  kudu::master::CatalogManagerBgTasks::Run() at ??:0
    @     0x7fecb645c1f7  boost::_mfi::mf0<>::operator()() at ??:0
    @     0x7fecb645c15b  boost::_bi::list1<>::operator()<>() at ??:0
    @     0x7fecb645c104  boost::_bi::bind_t<>::operator()() at ??:0
    @     0x7fecb645bf2a  boost::detail::function::void_function_obj_invoker0<>::invoke() at ??:0
    @     0x7fecb0b66d82  boost::function0<>::operator()() at ??:0
    @     0x7fecafba96c0  kudu::Thread::SuperviseThread() at ??:0
    @           0x423faa  __tsan_thread_start_func at ??:0
    @     0x7fecb28459d1  start_thread at ??:0
    @     0x7fecacd108fd  clone at ??:0

Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
2 files changed, 41 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/3309/4
-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 4:

Build Started http://104.196.14.100/job/kudu-gerrit/1777/

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Kudu Jenkins (Code Review)" <ge...@cloudera.org>.
Kudu Jenkins has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 5:

Build Started http://104.196.14.100/job/kudu-gerrit/1779/

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 7: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 7
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Jean-Daniel Cryans (Code Review)" <ge...@cloudera.org>.
Jean-Daniel Cryans has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3309/5//COMMIT_MSG
Commit Message:

PS5, Line 13: It's still a mystery why the Java tests
> Ah. I had changed the commit description in PS4 (via gerrit) then blew it a
Alright good enough.


-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/3309/2//COMMIT_MSG
Commit Message:

Line 15: GetTableLocations() call to a table whose tablets have been assigned but are
> I'm not entirely following what GetTableLocations has to do with this bug -
Yes, the crash is in ProcessPendingAssignments, but the DCHECK itself is:

  DCHECK(lock_.HasReaders() || lock_.HasWriteLock());

So we accessed an object without a lock while someone else had that object locked for reading or for writing. I don't think that "someone else" is the background thread, so I think it's got to be another client thread. A GetTableLocations() or GetTabletLocations() on the same table would lock some tablets for reading, and since the Java code in question does exactly that (create a table followed by two calls to GetTabletLocations() on the same table), it's conceivable that it's responsible.

It's strange, though, since the Java client is supposed to wait for the table to be created before continuing. That's why I'm going to spend some more time digging into this.


-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 6: Verified+1

Another Java testDisconnect failure.

-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change.

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................


Patch Set 6:

(1 comment)

I wish there was an easier way to hide the data object so that we can only access it through the lock object. It's too easy to make the mistake what this patch is fixing when using COWLocks.

http://gerrit.cloudera.org:8080/#/c/3309/6/src/kudu/master/catalog_manager.h
File src/kudu/master/catalog_manager.h:

Line 525:   // Send the "create tablet request" to a particular peer.
Comment needs updating -- it's for all peers of a particular tablet


-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 6
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] KUDU-1473: fix some tablet lock usage in CatalogManager

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/3309

to look at the new patch set (#5).

Change subject: KUDU-1473: fix some tablet lock usage in CatalogManager
......................................................................

KUDU-1473: fix some tablet lock usage in CatalogManager

This was probably due to the refactoring done in commit 59ff89d. Now that
we're releasing tablet write locks as early as possible, we need to
reacquire them (in read mode) for some operations.

It's still a mystery why the Java tests sometimes trigger the locking
violations and the C++ tests don't. Per the backtrace in the bug report, any
GetTableLocations() call to a table whose tablets have been assigned but are
otherwise still under construction may trigger a violation, as
GetTableLocations() acquires a tablet lock in read mode.

Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
---
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
2 files changed, 41 insertions(+), 26 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/09/3309/5
-- 
To view, visit http://gerrit.cloudera.org:8080/3309
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8e24f6035f4d778995ea3f295396f5fbd760d6c6
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>