You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2017/12/16 03:28:27 UTC

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8859


Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................

KUDU-2115: remove unnecessary compaction selection check

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped. As such,
the verification hits false positives when other threads are able to
take the component_lock_ and modify the available rowsets.

This verification failure here is harmless, and the verification itself
so I've taken the block of code for it. Removal of this should be
harmless, as it still maintains the invariant that only things that are
available to be compacted are returned by PickRowSetsToCompact().

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing TestRandomOpSequence, which
schedules many compactions, fail consistently with the failed
verification. Upon making the fix, it passed.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/tablet.cc
1 file changed, 0 insertions(+), 18 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8859/2//COMMIT_MSG@23
PS2, Line 23: as the correct set of rowsets
            : would be returned by PickRowSetsToCompact()
> What is 'the correct set of rowsets'?  Does this mean returning a rowset wh
Ah, correct here means that rowsets that no longer exist (eg that have already been compacted) will not be returned.



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 20 Dec 2017 19:30:41 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc@1626
PS5, Line 1626:   //  rs->set_has_been_compacted();
> oops
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 23:15:37 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/3/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8859/3/src/kudu/tablet/tablet.cc@a1341
PS3, Line 1341: 
Maybe, at least return an error in this case to preserve the original behavior.

I'm not sure it's safe otherwise -- I'll need to get better understanding of that mechanics to be sure.

As of now, I'm under impression that this check is to prevent compaction of the same rowset concurrently by multiple actors.  I might be wrong, but this change does not look safe to me, maybe just because of my limited knowledge of the internals.



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 20 Dec 2017 19:41:03 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8859/2//COMMIT_MSG@23
PS2, Line 23: as the correct set of rowsets
            : would be returned by PickRowSetsToCompact()
What is 'the correct set of rowsets'?  Does this mean returning a rowset which has already been selected for compaction by some other actor is harmless?  I'm not sure that that concurrent compacting the same rowset is harmless.



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 20 Dec 2017 19:19:19 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@19
PS3, Line 19: rowsets and the verification, the component_lock_ is dropped, allowing
            : the following race:
            : 
> The race wasn't immediately clear to me from the commit message but I looke
Yep, the biggest piece of this is that the component_lock_ is dropped while running compaction selection policy.

T1: makes copy {A,B} and drops component_lock_
T2: makes copy {A,B} and drops component_lock_

T1: considers {A,B}, and selects them for compaction, taking their compact_flush_lock_
T1: finishes compaction, removes rowsets from tree, drops {A,B}'s compact_flush_lock_

T2: considers {A,B}, and selects them for compaction, taking their compact_flush_lock_
T2: can't proceed because it can't find them in {A,B} in the rowset tree


http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@24
PS3, Line 24: T2:   snapshots the rowset tree (which still contains {A, B})
> Ignoring and proceeding could have the side effect of running a fairly usel
I'm not super convinced (a) would be able to avoid the race without holding the component lock while selecting, so I'm going with (b).



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 19:00:21 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
    unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
    IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
    still available
T2: sees that some of the selected rowsets would not be returned because
    they are missing from the currently available rowsets (DFATALs and
    aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Reviewed-on: http://gerrit.cloudera.org:8080/8859
Tested-by: Kudu Jenkins
Reviewed-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 170 insertions(+), 77 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Todd Lipcon: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 8
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................


Patch Set 4:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h
File src/kudu/tablet/diskrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@435
PS4, Line 435:   virtual Status DebugDump(std::vector<std::string> *lines = NULL) override;
> warning: default arguments on virtual or override methods are prohibited [g
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@478
PS4, Line 478:   std::atomic<bool> has_been_compacted_;
> add doc
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@293
PS4, Line 293:   // TODO: unit test me
> warning: missing username/bug in TODO [google-readability-todo]
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@350
PS4, Line 350:   virtual Status DebugDump(std::vector<std::string> *lines = NULL) override;
> warning: default arguments on virtual or override methods are prohibited [g
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@376
PS4, Line 376:   double DeltaStoresCompactionPerfImprovementScore(DeltaCompactionType type) const override {
> warning: parameter 'type' is unused [misc-unused-parameters]
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@441
PS4, Line 441:   std::atomic<bool> has_been_compacted_;
> doc
Done


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h@237
PS4, Line 237: rowset
> i think better to be explicit and say that it has been removed from the Row
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 23:11:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................


Patch Set 7: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 10 Jan 2018 01:58:37 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#7).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
    unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
    IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
    still available
T2: sees that some of the selected rowsets would not be returned because
    they are missing from the currently available rowsets (DFATALs and
    aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 170 insertions(+), 77 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/7
-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#3).

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................

KUDU-2115: remove unnecessary compaction selection check

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped. As such,
the verification hits false negatives if other threads are able to take
the component_lock_ and compact the available rowsets.

The verification itself is unnecessary, as rowsets deemed missing are
not returned by PickRowSetsToCompact() anyway. I've removed this check.

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/tablet.cc
1 file changed, 0 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8859/5/src/kudu/tablet/tablet.cc@1626
PS5, Line 1626:   //  rs->set_has_been_compacted();
oops



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 23:12:45 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#2).

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................

KUDU-2115: remove unnecessary compaction selection check

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped. As such,
the verification hits false negatives if other threads are able to take
the component_lock_ and modify the available rowsets.

The verification itself is unnecessary, as the correct set of rowsets
would be returned by PickRowSetsToCompact(), even if the verification
were to fail, so I've removed it.

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/tablet.cc
1 file changed, 0 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/3/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8859/3/src/kudu/tablet/tablet.cc@a1341
PS3, Line 1341: 
> Maybe, at least return an error in this case to preserve the original behav
I don't think we needed to abort these compactions in the first place. The output parameter `picked` would correctly return only the selected rowsets that still exist while holding the component_lock_.

Although it's a good question, why was this here in the first place?



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 20 Dec 2017 20:05:31 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................


Patch Set 4:

(3 comments)

looks like all the tests failed :)

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h
File src/kudu/tablet/diskrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/diskrowset.h@478
PS4, Line 478:   std::atomic<bool> has_been_compacted_;
add doc


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/memrowset.h@441
PS4, Line 441:   std::atomic<bool> has_been_compacted_;
doc


http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/8859/4/src/kudu/tablet/rowset.h@237
PS4, Line 237: rowset
i think better to be explicit and say that it has been removed from the RowSetTree



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 04 Jan 2018 21:23:18 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#6).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
    unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
    IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
    still available
T2: sees that some of the selected rowsets would not be returned because
    they are missing from the currently available rowsets (DFATALs and
    aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 168 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/6
-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#5).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
    unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
    IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
    still available
T2: sees that some of the selected rowsets would not be returned because
    they are missing from the currently available rowsets (DFATALs and
    aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset-test.cc
M src/kudu/tablet/memrowset.cc
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
9 files changed, 168 insertions(+), 75 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/5
-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@19
PS3, Line 19: rowsets and the verification, the component_lock_ is dropped. As such,
            : the verification hits false negatives if other threads are able to take
            : the component_lock_ and compact the available rowsets.
The race wasn't immediately clear to me from the commit message but I looked at the code and think I get it. To verify that I'm understanding correctly, is the race as follows?

T1: runs PickRowSetsToCompact, picks {A, B}, locks them
T1: begins compaction

T2: starts running PickRowSetsToCompact
T2: - snapshots the tree (which still contains {A, B})
T2: - drops the component_lock after snapshot

T1: finishes the compaction and removes {A, B} from the RowSetTree and unlocks their compact_flush_lock

T2: - iterates over the rowsets and sees {A, B} as "IsAvailableForCompaction" (because they've become unlocked)
T2: - selects them for compaction, but they are no longer in the tree (FATALs)

if so, can you add this to the commit message?


http://gerrit.cloudera.org:8080/#/c/8859/3//COMMIT_MSG@24
PS3, Line 24: not returned by PickRowSetsToCompact() anyway. I've removed this check.
Ignoring and proceeding could have the side effect of running a fairly useless compaction. Instead, could we either:

(a) move the filtering by IsAvailableForCompaction from RowSetInfo::CollectOrdered() into Tablet::PickRowSetsToCompact() while holding the component lock? That way we wouldn't possibly select a RowSet which has been later removed from the tree. (this would require some signature changes on CompactionPolicy to just take a vector of RowSets instead of a RowSetTree but that's probably fine)

or

(b) have some flag in the RowSet like 'removed' and make IsAvailableForCompaction check both the compact_flush_lock_ and the 'removed_' flag? That way in the case that something is removed from the tree, the flag would get set and prevent it from getting selected in a later compaction policy invocation.



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Thu, 21 Dec 2017 17:32:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: remove unnecessary compaction selection check

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: remove unnecessary compaction selection check
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/8859/3/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/8859/3/src/kudu/tablet/tablet.cc@a1341
PS3, Line 1341: 
> I don't think we needed to abort these compactions in the first place. The 
All right.  Since it's not clear to me what was the idea behind the original behavior and that DFATAL log, I don't think I'm qualified to review this patch.  You need to get an opinion on this change from experts in this area.



-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 20 Dec 2017 20:39:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8859 )

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................


Patch Set 6:

IWYU and clock sync issues


-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 05 Jan 2018 00:38:42 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2115: avoid compacting already-compacted rowsets

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8859

to look at the new patch set (#4).

Change subject: KUDU-2115: avoid compacting already-compacted rowsets
......................................................................

KUDU-2115: avoid compacting already-compacted rowsets

Tablets will perform compaction selection on a copy of the currently
available rowsets in order to avoid holding the component_lock_ for the
duration of rowset selection.

It is then verified that nothing else compacted the selected rowsets by
iterating over the selected rowsets and checking that they still exist
to be compacted.

However, the initial selection of rowsets is performed on a snapshotted
copy of the available rowsets, and in between the snapshot of the
rowsets and the verification, the component_lock_ is dropped, allowing
the following race:

T1: runs PickRowSetsToCompact, picks {A, B}, begins compaction
T2: runs PickRowSetsToCompact:
T2:   snapshots the rowset tree (which still contains {A, B})
T2:   drops component_lock_ after taking the snapshot
T1: finishes compaction and removes {A, B} from the rowset tree,
    unlocking each rowset's compact_flush_lock_
T2: iterates over the rowset tree and sees {A, B} as
    IsAvailableForCompaction(), since they have been unlocked
T2: selects {A, B} as a part of its compaction
T2: grabs the component_lock_ to verify that the selected rowsets are
    still available
T2: sees that some of the selected rowsets would not be returned because
    they are missing from the currently available rowsets (DFATALs and
    aborts the compaction)

I verified the diagnosis by placing a random sleep just after making the
copy in PickRowSetsToCompact() and seeing
TabletServerDiskFailureTest.TestRandomOpSequence, which schedules many
compactions, fail consistently with the failed verification. Upon making
the fix, this passes.

This can lead to scheduling a fairly useless compaction. This patch
adds an API to RowSets to mark itself as compacted, and ensure that when
selecting rowsets to compact, check that rowset candidates have not
already been compacted.

To verify the solution, I've added a small test that schedules several
concurrent compactions. Upon adding the aforementioned randomized
delay and removing a couple sanity D/CHECKs (including the above
verification), I saw the test fail, leading to an unexpected number of
rows left in the tablet.

Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
---
M src/kudu/tablet/compaction-test.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
6 files changed, 155 insertions(+), 69 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/59/8859/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8859
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I4dab330d61facb18717f6faf179f9b94a9e55236
Gerrit-Change-Number: 8859
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>