You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/10/06 06:29:39 UTC

[kudu-CR] [compaction] Small improvements to compaction policy tests

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11605


Change subject: [compaction] Small improvements to compaction policy tests
......................................................................

[compaction] Small improvements to compaction policy tests

To learn more about compaction policy, I spent some time reading over
compaction_policy-test.cc, and in doing so I made a bunch of small
improvements. I also added one test covering a facet of the policy
that's pretty important but isn't obviously covered in the design docs
or the existing tests.

There are no functional non-test changes in this patch. I added
scaled-down versions of a couple of slow tests that can run all the
time, and made one slow test only run when slow tests are allowed.

Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
---
M src/kudu/tablet/compaction_policy-test.cc
1 file changed, 131 insertions(+), 65 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/11605/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [compaction] Small improvements to compaction policy tests

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11605 )

Change subject: [compaction] Small improvements to compaction policy tests
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc
File src/kudu/tablet/compaction_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@117
PS2, Line 117: // Similar to the above, but with some small overlap between adjacent
             : // rowsets.
In compaction-policy parlance, this means is testing compaction on a "short" (keyspace height is small) tablet, right? Whereas TestSignificantOverlap is "tall"?


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@131
PS2, Line 131: %09d", i * 10000),
             :         StringPrintf("%09d
Was this change important?


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@178
PS2, Line 178: the
nit: remove


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@185
PS2, Line 185: // compacting {A, B, C} results in the same quality score as {A, B}, 0.67, but
             : // uses more budget. By penalizing the wider solution {A, B, C}, we ensure we
             : // don't waste I/O.
Not really related to this change, but thought I'd ask. Conceptually, I get that we're reducing the summed width an equal amount in both cases, but in one case we're using 2 "units" of IO and in the other we're using 3, but I haven't found a doc or comment explaining what 'quality' is in this case. Do they exist somewhere? Is there a formula for 'quality'?



-- 
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 09 Oct 2018 07:29:49 +0000
Gerrit-HasComments: Yes

[kudu-CR] [compaction] Small improvements to compaction policy tests

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Attila Bukor, Kudu Jenkins, Andrew Wong, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11605

to look at the new patch set (#3).

Change subject: [compaction] Small improvements to compaction policy tests
......................................................................

[compaction] Small improvements to compaction policy tests

To learn more about compaction policy, I spent some time reading over
compaction_policy-test.cc, and in doing so I made a bunch of small
improvements. I also added one test covering a facet of the policy
that's pretty important but isn't obviously covered in the design docs
or the existing tests.

There are no functional non-test changes in this patch. I added
scaled-down versions of a couple of slow tests that can run all the
time, and made one slow test only run when slow tests are allowed.

Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
---
M src/kudu/tablet/compaction_policy-test.cc
1 file changed, 131 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/05/11605/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] [compaction] Small improvements to compaction policy tests

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/11605 )

Change subject: [compaction] Small improvements to compaction policy tests
......................................................................


Patch Set 3: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 09 Oct 2018 16:14:03 +0000
Gerrit-HasComments: No

[kudu-CR] [compaction] Small improvements to compaction policy tests

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11605 )

Change subject: [compaction] Small improvements to compaction policy tests
......................................................................


Patch Set 2:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc
File src/kudu/tablet/compaction_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@88
PS2, Line 88: so the quality
            :   // score, which should be
> nit: "so the quality score should be"
Boy that comment was a mess. I rekajiggered the punctuation and wording.


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@117
PS2, Line 117: // Similar to the above, but with some small overlap between adjacent
             : // rowsets.
> In compaction-policy parlance, this means is testing compaction on a "short
The max height is 2 in this case, and 3 in significant overlap, and is not the main difference between these tests. This test is meant to test that we don't rowset compact if there's only a small overlap, like the above test is that we don't compact when there's none, and the below test is that we do compact when there's a lot.


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@131
PS2, Line 131: %09d", i * 10000),
             :         StringPrintf("%09d
> Was this change important?
The point of it was to make the comment fit better before I decided to write the integers and not the fully-padded strings. But I left it since it's still correct, and less arbitrary because it's exactly the length needed (9999 * 10000 + 11000 = 100001000 has 9 digits).


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@178
PS2, Line 178: the
> nit: remove
Done


http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@185
PS2, Line 185: // compacting {A, B, C} results in the same quality score as {A, B}, 0.67, but
             : // uses more budget. By penalizing the wider solution {A, B, C}, we ensure we
             : // don't waste I/O.
> Not really related to this change, but thought I'd ask. Conceptually, I get
Check out docs/design-docs/compaction-policy.md. I don't think it uses the word 'quality' like the code does but it describes the ideas. The formula for the quality of a rowset compaction of a set of rowsets X is

    quality(X) = (sum of widths of rowsets in X) - (width of the union of the rowsets in X)

and the width of a rowset (or any set of keys) is the probability that a key falls between the set's min key and max key.

The probability distribution we use is one that weights the keyspace by how much data there is. So if [k_0, k_1] and [k_2, k_3] are key intervals, and [k_0, k_1] has 2x as much data in it, the width of [k_0, k_1] is 2x that of [k_2, k_3].

I think Todd and a guy who left a long, long time ago were the experts on this, so I'm hoping to bootcamp a few of us into familiarity using these smaller patches, before I try to make some larger changes to the compaction policy.



-- 
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 09 Oct 2018 15:29:02 +0000
Gerrit-HasComments: Yes

[kudu-CR] [compaction] Small improvements to compaction policy tests

Posted by "Attila Bukor (Code Review)" <ge...@cloudera.org>.
Attila Bukor has posted comments on this change. ( http://gerrit.cloudera.org:8080/11605 )

Change subject: [compaction] Small improvements to compaction policy tests
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc
File src/kudu/tablet/compaction_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11605/2/src/kudu/tablet/compaction_policy-test.cc@88
PS2, Line 88: so the quality
            :   // score, which should be
nit: "so the quality score should be"



-- 
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 09 Oct 2018 12:47:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] [compaction] Small improvements to compaction policy tests

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11605 )

Change subject: [compaction] Small improvements to compaction policy tests
......................................................................

[compaction] Small improvements to compaction policy tests

To learn more about compaction policy, I spent some time reading over
compaction_policy-test.cc, and in doing so I made a bunch of small
improvements. I also added one test covering a facet of the policy
that's pretty important but isn't obviously covered in the design docs
or the existing tests.

There are no functional non-test changes in this patch. I added
scaled-down versions of a couple of slow tests that can run all the
time, and made one slow test only run when slow tests are allowed.

Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Reviewed-on: http://gerrit.cloudera.org:8080/11605
Tested-by: Kudu Jenkins
Reviewed-by: Attila Bukor <ab...@apache.org>
---
M src/kudu/tablet/compaction_policy-test.cc
1 file changed, 131 insertions(+), 65 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Attila Bukor: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/11605
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8f7a0e2c69f301d7fc7ca2fac657569ea240d4e3
Gerrit-Change-Number: 11605
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>