You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Adar Dembo (Code Review)" <ge...@cloudera.org> on 2017/09/07 11:09:56 UTC

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Hello Dan Burkert, Todd Lipcon,

I'd like you to do a code review.  Please visit

    http://gerrit.cloudera.org:8080/7995

to review the following change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................

catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets

RWCLock::HasWriteLock never worked properly because last_writer_tid_ wasn't
reset when the lock was released. Well, it worked properly (though it's far
less useful) in non-DEBUG builds, but then the various HasWriteLock DCHECKs
are compiled out. Who knew?

When fixed, TableInfo::AddRemoveTablets broke; we had been reading some
tablet metadata without first acquiring a lock! This was frustrasting to
address because it meant retreading over the most error-prone aspect of the
catalog manager: for operations that require several writes in order to
"publish" their results, in what order should those writes occur? I tried my
best to get this right, but who knows how I did...

Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
---
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/util/cow_object.h
M src/kudu/util/rwc_lock.cc
M src/kudu/util/rwc_lock.h
6 files changed, 60 insertions(+), 16 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/7995/1
-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7995/2/src/kudu/master/catalog_manager-test.cc
File src/kudu/master/catalog_manager-test.cc:

PS2, Line 71: tablets.push_back(tablet);
If the call to table->AddRemoveTablets({}, tablets) at line 93 has gone, do we need populating the tablets at all?

From the other side, if removing that call at line 93, do we have some other place to cover that scenario?


-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 2:

gotcha; sounds good.

-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7995

to look at the new patch set (#3).

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................

catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets

RWCLock::HasWriteLock never worked properly because last_writer_tid_ wasn't
reset when the lock was released. Well, it worked properly (though it's far
less useful) in non-DEBUG builds, but then the various HasWriteLock DCHECKs
are compiled out. Who knew?

When fixed, TableInfo::AddRemoveTablets broke; we had been reading some
tablet metadata without first acquiring a lock! This was frustrasting to
address because it meant retreading over the most error-prone aspect of the
catalog manager: for operations that require several writes in order to
"publish" their results, in what order should those writes occur? I tried my
best to get this right, but who knows how I did...

Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
---
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/util/cow_object.h
M src/kudu/util/rwc_lock.cc
M src/kudu/util/rwc_lock.h
6 files changed, 95 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/7995/3
-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 3
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 2:

> yikes.  does this indicate a lack of test coverage around
 > concurrent alter and tablet drops?  The lack of read locks when
 > creating tablets seems benign since those won't be visible, but the
 > concurrent increment schema / decrement schema that could happen
 > from dropping and altering a tablet at once seems like a big race.

Not sure I follow the race; aren't the increment/decrement schema operations protected by the table spinlock? The tablet's reported schema version is in-memory only so it's not protected by the tablet metadata lock, just by its spinlock (and the table spinlock, for the subsequent increment/decrement operations).

-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 4: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has submitted this change and it was merged.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets

RWCLock::HasWriteLock never worked properly because last_writer_tid_ wasn't
reset when the lock was released. Well, it worked properly (though it's far
less useful) in non-DEBUG builds, but then the various HasWriteLock DCHECKs
are compiled out. Who knew?

When fixed, TableInfo::AddRemoveTablets broke; we had been reading some
tablet metadata without first acquiring a lock! This was frustrasting to
address because it meant retreading over the most error-prone aspect of the
catalog manager: for operations that require several writes in order to
"publish" their results, in what order should those writes occur? I tried my
best to get this right, but who knows how I did...

Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Reviewed-on: http://gerrit.cloudera.org:8080/7995
Tested-by: Kudu Jenkins
Reviewed-by: Dan Burkert <da...@apache.org>
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/util/cow_object.h
M src/kudu/util/rwc_lock.cc
M src/kudu/util/rwc_lock.h
6 files changed, 96 insertions(+), 22 deletions(-)

Approvals:
  Dan Burkert: Looks good to me, approved
  Alexey Serbin: Looks good to me, approved
  Kudu Jenkins: Verified



-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 5
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7995/2/src/kudu/master/catalog_manager-test.cc
File src/kudu/master/catalog_manager-test.cc:

PS2, Line 71: tablets.push_back(tablet);
> If the call to table->AddRemoveTablets({}, tablets) at line 93 has gone, do
Bear in mind that the TableInfo only has raw pointers to its TabletInfos. The expectation is that the CatalogManager stores strong refs to the TabletInfos in its maps; 'tablets' serves as a stand-in for those maps in this test.

AFAICT, this is just test setup, so the AddRemoveTablets() call that dropped the tablets wasn't providing any value.


http://gerrit.cloudera.org:8080/#/c/7995/2/src/kudu/util/cow_object.h
File src/kudu/util/cow_object.h:

PS2, Line 60: DCHECK(lock_.HasWriteLock());
> Why not to put that check right into WriteUnlock()/CommitUnlock()/UpgradeTo
Good idea. I've added those checks into RWCLock itself, though I'm keeping these checks here because I think they're still useful (with the possibly exception of this particular check, though consistency bears out).


-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 4: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7995/2/src/kudu/master/catalog_manager-test.cc
File src/kudu/master/catalog_manager-test.cc:

PS2, Line 71: tablets.push_back(tablet);
> Bear in mind that the TableInfo only has raw pointers to its TabletInfos. T
I see -- thanks for the clarification.


-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 1:

yikes.  does this indicate a lack of test coverage around concurrent alter and tablet drops?  The lack of read locks when creating tablets seems benign since those won't be visible, but the concurrent increment schema / decrement schema that could happen from dropping and altering a tablet at once seems like a big race.

-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 2: Code-Review+2

-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Hello Dan Burkert, Kudu Jenkins,

I'd like you to reexamine a change.  Please visit

    http://gerrit.cloudera.org:8080/7995

to look at the new patch set (#4).

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................

catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets

RWCLock::HasWriteLock never worked properly because last_writer_tid_ wasn't
reset when the lock was released. Well, it worked properly (though it's far
less useful) in non-DEBUG builds, but then the various HasWriteLock DCHECKs
are compiled out. Who knew?

When fixed, TableInfo::AddRemoveTablets broke; we had been reading some
tablet metadata without first acquiring a lock! This was frustrasting to
address because it meant retreading over the most error-prone aspect of the
catalog manager: for operations that require several writes in order to
"publish" their results, in what order should those writes occur? I tried my
best to get this right, but who knows how I did...

Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
---
M src/kudu/master/catalog_manager-test.cc
M src/kudu/master/catalog_manager.cc
M src/kudu/master/catalog_manager.h
M src/kudu/util/cow_object.h
M src/kudu/util/rwc_lock.cc
M src/kudu/util/rwc_lock.h
6 files changed, 96 insertions(+), 22 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/95/7995/4
-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Dan Burkert (Code Review)" <ge...@cloudera.org>.
Dan Burkert has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 1:

Code looks good.

-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 1
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: No

[kudu-CR] catalog manager: fix unprotected data access in TableInfo::AddRemoveTablets

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change.

Change subject: catalog_manager: fix unprotected data access in TableInfo::AddRemoveTablets
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/7995/2/src/kudu/util/cow_object.h
File src/kudu/util/cow_object.h:

PS2, Line 60: DCHECK(lock_.HasWriteLock());
Why not to put that check right into WriteUnlock()/CommitUnlock()/UpgradeToCommitLock() ?


-- 
To view, visit http://gerrit.cloudera.org:8080/7995
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ifcc5d3a9b985210f1fdd6f0495326fa3eb707841
Gerrit-PatchSet: 2
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Dan Burkert <da...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-HasComments: Yes