You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/10/10 00:10:16 UTC

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/11639


Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................

[compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

This adds a new metric 'average_diskrowset_height' that reflects how
uncompacted a tablet replica is. This metric is obtained by integrating
the height function with respect to the by-data-size probability
distribution used by the compaction policy.

To implement the integration, I piggy-backed on the function that
computes the CDF for the rowset layout, since computing the integral
requires computing the CDF and it seemed wasteful to first compute the
CDF, then go through an almost entirely similar bit of logic to compute
the average height.

Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
7 files changed, 284 insertions(+), 53 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/11639/1
-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11639 )

Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................


Patch Set 3: Code-Review+2

(1 comment)

We discussed this offline: currently the placement of the computation of the rowset height is such that it is performed every time we swap in new rowsets. Conceptually this makes sense, but a quirk that exists is that we'll swap rowsets twice during merges/flushes, rendering one of these computations unnecessary. Will's added a TODO to cache the height to avoid this.

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/rowset_info.cc
File src/kudu/tablet/rowset_info.cc:

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/rowset_info.cc@265
PS1, Line 265: push_back
> Can't because that constructor is private. I'm not sure why it is private..
Ah interesting, yeah I agree about not changing it. TIL emplace_back doesn't work with private constructors



-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 20:11:53 +0000
Gerrit-HasComments: Yes

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11639 )

Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................


Patch Set 4:

> Patch Set 4:
> 
> (1 comment)
> 
> > We discussed this offline: currently the placement of the computation of the rowset height is such that it is performed every time we swap in new rowsets. Conceptually this makes sense, but a quirk that exists is that we'll swap rowsets twice during merges/flushes, rendering one of these computations unnecessary. Will's added a TODO to cache the height to avoid this.
> 
> Yes, I'm also curious about what kind of overhead this adds. Do we only swap rowsets during an IO operation (i.e. flush or compaction)? Or at other times too?

Right, AFAICT AtomicSwapRowSets*() is always accompanied by some sort of flush.


-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 18:02:34 +0000
Gerrit-HasComments: No

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/11639 )

Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................


Patch Set 1: Code-Review+1

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/compaction_policy-test.cc
File src/kudu/tablet/compaction_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/compaction_policy-test.cc@314
PS1, Line 314:   /* A rowset that's one key wide.
             :    * |
             :    */
             :   EXPECT_EQ(0.0, ComputeAverageRowsetHeight({ { "A", "A" } }));
             : 
             :   /* A single rowset.
             :    * [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" } }));
             : 
             :   /* Two rowsets with no empty space between.
             :    * [ --- ][ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" }, { "B", "C" } }));
             : 
             : 
             :   /* Three rowsets with no empty spaces between.
             :    * [ --- ][ --- ][ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" },
             :                                               { "B", "C" },
             :                                               { "C", "D" } }));
             : 
             :   /* Two rowsets with empty space between them.
             :    * [ --- ]       [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" }, { "C", "D" } }));
             : 
             :   /* Three rowsets with empty space between them.
             :    * [ --- ]       [ --- ]       [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" },
             :                                               { "C", "D" },
             :                                               { "E", "F" } }));
             : 
             :   /* Three rowsets with empty space between them, and one is a single key.
             :    * [ --- ]       |       [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" },
             :                                               { "C", "C" },
             :                                               { "D", "D" } }));
Just making sure I understand, this is the source of trouble for KUDU-1400, right? These are all considered "compacted" so they aren't considered?


http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/rowset_info.cc
File src/kudu/tablet/rowset_info.cc:

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/rowset_info.cc@265
PS1, Line 265: push_back
nit: use emplace_back()


http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/tablet.cc@1050
PS1, Line 1050: 
nit: spurious change



-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Tue, 16 Oct 2018 17:33:40 +0000
Gerrit-HasComments: Yes

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/11639 )

Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/compaction_policy-test.cc
File src/kudu/tablet/compaction_policy-test.cc:

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/compaction_policy-test.cc@314
PS1, Line 314:   /* A rowset that's one key wide.
             :    * |
             :    */
             :   EXPECT_EQ(0.0, ComputeAverageRowsetHeight({ { "A", "A" } }));
             : 
             :   /* A single rowset.
             :    * [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" } }));
             : 
             :   /* Two rowsets with no empty space between.
             :    * [ --- ][ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" }, { "B", "C" } }));
             : 
             : 
             :   /* Three rowsets with no empty spaces between.
             :    * [ --- ][ --- ][ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" },
             :                                               { "B", "C" },
             :                                               { "C", "D" } }));
             : 
             :   /* Two rowsets with empty space between them.
             :    * [ --- ]       [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" }, { "C", "D" } }));
             : 
             :   /* Three rowsets with empty space between them.
             :    * [ --- ]       [ --- ]       [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" },
             :                                               { "C", "D" },
             :                                               { "E", "F" } }));
             : 
             :   /* Three rowsets with empty space between them, and one is a single key.
             :    * [ --- ]       |       [ --- ]
             :    */
             :   EXPECT_EQ(1.0, ComputeAverageRowsetHeight({ { "A", "B" },
             :                                               { "C", "C" },
             :                                               { "D", "D" } }));
> Just making sure I understand, this is the source of trouble for KUDU-1400,
Yes, if these rowsets are small and numerous then the fixed cost of processing them in scans can dominate the scan and cause scan requests to time out.


http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/rowset_info.cc
File src/kudu/tablet/rowset_info.cc:

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/rowset_info.cc@265
PS1, Line 265: push_back
> nit: use emplace_back()
Can't because that constructor is private. I'm not sure why it is private...but I trust there's a reason so I don't want to change it for the sake of this small, loosely-related-to-the-patch optimization in code that already works fine.


http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/11639/1/src/kudu/tablet/tablet.cc@1050
PS1, Line 1050: 
> nit: spurious change
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 16 Oct 2018 17:53:24 +0000
Gerrit-HasComments: Yes

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11639 )

Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................

[compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

This adds a new metric 'average_diskrowset_height' that reflects how
uncompacted a tablet replica is. This metric is obtained by integrating
the height function with respect to the by-data-size probability
distribution used by the compaction policy.

To implement the integration, I piggy-backed on the function that
computes the CDF for the rowset layout, since computing the integral
requires computing the CDF and it seemed wasteful to first compute the
CDF, then go through an almost entirely similar bit of logic to compute
the average height.

Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Reviewed-on: http://gerrit.cloudera.org:8080/11639
Reviewed-by: Andrew Wong <aw...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
7 files changed, 287 insertions(+), 53 deletions(-)

Approvals:
  Andrew Wong: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/11639 )

Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................


Patch Set 4:

(1 comment)

> We discussed this offline: currently the placement of the computation of the rowset height is such that it is performed every time we swap in new rowsets. Conceptually this makes sense, but a quirk that exists is that we'll swap rowsets twice during merges/flushes, rendering one of these computations unnecessary. Will's added a TODO to cache the height to avoid this.

Yes, I'm also curious about what kind of overhead this adds. Do we only swap rowsets during an IO operation (i.e. flush or compaction)? Or at other times too?

http://gerrit.cloudera.org:8080/#/c/11639/4/src/kudu/tablet/rowset_info.cc
File src/kudu/tablet/rowset_info.cc:

http://gerrit.cloudera.org:8080/#/c/11639/4/src/kudu/tablet/rowset_info.cc@168
PS4, Line 168: void CheckCollectOrderedCorrectness(const vector<RowSetInfo>& min_key,
May want to rename this; it was named to match CollectOrdered.



-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 17 Oct 2018 00:41:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Alexey Serbin, Attila Bukor, Kudu Jenkins, Andrew Wong, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/11639

to look at the new patch set (#3).

Change subject: [compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted
......................................................................

[compaction] KUDU-2056: Expose a metric for how much a tablet needs to be compacted

This adds a new metric 'average_diskrowset_height' that reflects how
uncompacted a tablet replica is. This metric is obtained by integrating
the height function with respect to the by-data-size probability
distribution used by the compaction policy.

To implement the integration, I piggy-backed on the function that
computes the CDF for the rowset layout, since computing the integral
requires computing the CDF and it seemed wasteful to first compute the
CDF, then go through an almost entirely similar bit of logic to compute
the average height.

Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
---
M src/kudu/tablet/compaction_policy-test.cc
M src/kudu/tablet/compaction_policy.cc
M src/kudu/tablet/rowset_info.cc
M src/kudu/tablet/rowset_info.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet_metrics.cc
M src/kudu/tablet/tablet_metrics.h
7 files changed, 287 insertions(+), 53 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/39/11639/3
-- 
To view, visit http://gerrit.cloudera.org:8080/11639
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I98493b901d37bb278167ba2fe98d322a86a1f0f9
Gerrit-Change-Number: 11639
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Attila Bukor <ab...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>