You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Andrew Wong (Code Review)" <ge...@cloudera.org> on 2020/10/12 06:25:07 UTC

[kudu-CR] wip KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Andrew Wong has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16581


Change subject: wip KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................

wip KUDU-3195: flush when any DMS in the tablet is older than the time threshold

NO_BUILD
wip: needs tests

Currently each tablet will wait at least 2 minutes between flushing
DMSs, even if there are several DMSs that are older than 2 minutes. This
means that for tablets with several dozen rowsets and updates across the
entire tablet, it may take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
13 files changed, 139 insertions(+), 107 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/16581/1
-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16581

to look at the new patch set (#3).

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................

KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Currently each tablet will wait at least 2 minutes between flushing
DMSs, even if there are several DMSs that are older than 2 minutes. This
means that for tablets with several dozen rowsets and updates across the
entire tablet, it may take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 184 insertions(+), 111 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/16581/3
-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 3
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16581/4/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/16581/4/src/kudu/tablet/tablet_replica_mm_ops.cc@264
PS4, Line 264:   }
> Nit: Perhaps update the docs in FlushOpPerfImprovementPolicy::SetPerfImprov
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Oct 2020 17:57:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16581

to look at the new patch set (#6).

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................

KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Currently each tablet will wait at least 2 minutes (controlled by
--flush_threshold_secs) between flushing DMSs, even if there are several
DMSs that are older than 2 minutes in a given tablet. This means that
for tablets with several dozen rowsets and updates across the entire
tablet, it could take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 234 insertions(+), 150 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/16581/6
-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] wip KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16581

to look at the new patch set (#2).

Change subject: wip KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................

wip KUDU-3195: flush when any DMS in the tablet is older than the time threshold

DONT_BUILD
wip: needs tests

Currently each tablet will wait at least 2 minutes between flushing
DMSs, even if there are several DMSs that are older than 2 minutes. This
means that for tablets with several dozen rowsets and updates across the
entire tablet, it may take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
13 files changed, 139 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/16581/2
-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 2
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins (120)

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................


Patch Set 5: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Oct 2020 17:59:07 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................

KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Currently each tablet will wait at least 2 minutes (controlled by
--flush_threshold_secs) between flushing DMSs, even if there are several
DMSs that are older than 2 minutes in a given tablet. This means that
for tablets with several dozen rowsets and updates across the entire
tablet, it could take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Reviewed-on: http://gerrit.cloudera.org:8080/16581
Tested-by: Kudu Jenkins
Reviewed-by: Alexey Serbin <as...@cloudera.com>
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 234 insertions(+), 150 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Alexey Serbin: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 7
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................


Patch Set 4: Code-Review+1


-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Oct 2020 17:09:33 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................


Patch Set 5:

(12 comments)

Just did a quick look.  Overall looks good, but I guess I'll need to take a second look into the essential changes in tablet.cc

http://gerrit.cloudera.org:8080/#/c/16581/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16581/5//COMMIT_MSG@9
PS5, Line 9: least 2 minutes
nit: maybe, it's worth pointing that it's controlled by the --flush_threshold_secs flag, i.e. it's not hard-coded?


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h@238
PS5, Line 238: the
nit: drop


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h@239
PS5, Line 239: size_t* size_bytes, MonoTime* creation_time
nit: maybe, it's worth mentioning what's the expectation on filling in the values of the output parameters?


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h@239
PS5, Line 239: DeltaMemStoreInfo
nit: maybe, name it as GetDMSInfo()/GetDeltaMemStoreInfo() or alike, pointing that the method retrieves info on a DMS.


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.cc@867
PS5, Line 867:   shared_lock<rw_spinlock> lock(component_lock_);
             :   if (dms_exists_.Load()) {
nit: do we expect high contention on component_lock_?  If so, then maybe try first to check the atomic before taking a shared lock (yes, it's uglier, but it might help with lock contention):

if (dms_exists_.Load()) {
  shared_lock<rw_spinlock> lock(component_lock_);
  if (dms_exists_.Load()) {
    ...
  }
}


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.cc@873
PS5, Line 873:   *size_bytes = 0;
             :   *creation_time = MonoTime();
Is it really necessary to bother filling in these if DMS isn't there?


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/memrowset.h@381
PS5, Line 381:     *size_bytes = 0;
             :     *creation_time = MonoTime();
nit: why to bother filling in these if returning 'false' anyways?


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet.cc@2234
PS5, Line 2234:     // To facilitate memory-based flushing when under memory pressure, we
              :     // define a score that's part memory and part WAL retention bytes.
              :     double score = dms_size_bytes * mem_weight + replay_size_bytes * (100 - mem_weight);
Should it be moved into a method/function?  I guess it might be re-used in other places as well.


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet_replica_mm_ops.cc@268
PS5, Line 268: MonoTime::Now() - earliest_dms_time
What if FindBestDMSToFlush() didn't find any DMS?  Does this stay semantically valid?


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc@4335
PS5, Line 4335: KUDU-3145
KUDU-3195 ?


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc@4356
PS5, Line 4356: 10
nit: maybe, using base::NumCPUs() would look a bit more natural?


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc@4362
PS5, Line 4362: 5
Just to make sure I understand: does this test create enough updates/deltas to span for longer that 5 flush thresholds if there is a regression?  Maybe, it's worth adding a comment to provide more information on the expected speed of DMS flushing given the parameters of this test scenario.



-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Oct 2020 19:21:05 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16581

to look at the new patch set (#4).

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................

KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Currently each tablet will wait at least 2 minutes between flushing
DMSs, even if there are several DMSs that are older than 2 minutes. This
means that for tablets with several dozen rowsets and updates across the
entire tablet, it may take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 211 insertions(+), 142 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/16581/4
-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Grant Henke, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/16581

to look at the new patch set (#5).

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................

KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Currently each tablet will wait at least 2 minutes between flushing
DMSs, even if there are several DMSs that are older than 2 minutes. This
means that for tablets with several dozen rowsets and updates across the
entire tablet, it may take hours to flush all the deltas.

Rather than waiting for 2 minutes since the last flush time before
considering time-based flushing, this patch tracks the creation time of
every DMS and flushes as long as there is a DMS that is older than 2
minutes in the tablet.

Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
---
M src/kudu/tablet/delta_tracker.cc
M src/kudu/tablet/delta_tracker.h
M src/kudu/tablet/deltamemstore.cc
M src/kudu/tablet/deltamemstore.h
M src/kudu/tablet/diskrowset.cc
M src/kudu/tablet/diskrowset.h
M src/kudu/tablet/memrowset.h
M src/kudu/tablet/mock-rowsets.h
M src/kudu/tablet/rowset.h
M src/kudu/tablet/tablet.cc
M src/kudu/tablet/tablet.h
M src/kudu/tablet/tablet_replica_mm_ops.cc
M src/kudu/tablet/tablet_replica_mm_ops.h
M src/kudu/tserver/tablet_server-test.cc
14 files changed, 219 insertions(+), 148 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/81/16581/5
-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 5
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Grant Henke (Code Review)" <ge...@cloudera.org>.
Grant Henke has posted comments on this change. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16581/4/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/16581/4/src/kudu/tablet/tablet_replica_mm_ops.cc@264
PS4, Line 264:   FlushOpPerfImprovementPolicy::SetPerfImprovementForFlush(
Nit: Perhaps update the docs in FlushOpPerfImprovementPolicy::SetPerfImprovementForFlush given its usage of `elapsed_time` is was based on time since flush and is now based on the time since the earliest dms.



-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 4
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Tue, 13 Oct 2020 17:13:34 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................


Patch Set 6:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/16581/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16581/5//COMMIT_MSG@9
PS5, Line 9: least 2 minutes
> nit: maybe, it's worth pointing that it's controlled by the --flush_thresho
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h
File src/kudu/tablet/delta_tracker.h:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h@238
PS5, Line 238: a D
> nit: drop
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h@239
PS5, Line 239: . Otherwise, returns false and doesn't upda
> nit: maybe, it's worth mentioning what's the expectation on filling in the 
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.h@239
PS5, Line 239: ich it was create
> nit: maybe, name it as GetDMSInfo()/GetDeltaMemStoreInfo() or alike, pointi
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.cc
File src/kudu/tablet/delta_tracker.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.cc@867
PS5, Line 867:   // Check dms_exists_ first to avoid unnecessary contention on
             :   // component_lock_. We ne
> nit: do we expect high contention on component_lock_?  If so, then maybe tr
Good point. We do this in other methods, so we might as well do so here.


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/delta_tracker.cc@873
PS5, Line 873:       *size_bytes = dms_->EstimateSize();
             :       *creation_time = dms_->c
> Is it really necessary to bother filling in these if DMS isn't there?
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/memrowset.h
File src/kudu/tablet/memrowset.h:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/memrowset.h@381
PS5, Line 381:     return false;
             :   }
> nit: why to bother filling in these if returning 'false' anyways?
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/rowset.h
File src/kudu/tablet/rowset.h:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/rowset.h@429
PS5, Line 429:   virtual Status DebugDump(std::vector<std::string> *lines = nullptr) override;
> warning: use nullptr [modernize-use-nullptr]
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/rowset.h@436
PS5, Line 436:     return nullptr;
> warning: use nullptr [modernize-use-nullptr]
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet.cc@2234
PS5, Line 2234:     // To facilitate memory-based flushing when under memory pressure, we
              :     // define a score that's part memory and part WAL retention bytes.
              :     double score = dms_size_bytes * mem_weight + replay_size_bytes * (100 - mem_weight);
> Should it be moved into a method/function?  I guess it might be re-used in 
It is not reused anywhere right now, and I actually question its value. I'm not sure the assessment on KUDU-2238 is complete, given they waited hours for a flush to happen. For now, I'm leaving it as is.


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet_replica_mm_ops.cc
File src/kudu/tablet/tablet_replica_mm_ops.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet_replica_mm_ops.cc@268
PS5, Line 268: now - earliest_dms_time).ToMillisec
> What if FindBestDMSToFlush() didn't find any DMS?  Does this stay semantica
Negative MonoDeltas are valid, though I'll update this to be more conservative with the MonoTime::Max() case.


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc
File src/kudu/tserver/tablet_server-test.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc@4335
PS5, Line 4335: KUDU-3195
> KUDU-3195 ?
Done


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc@4356
PS5, Line 4356: 
> nit: maybe, using base::NumCPUs() would look a bit more natural?
Actually this is fairly unimportant; I'll just stick with a single MM thread.


http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tserver/tablet_server-test.cc@4362
PS5, Line 4362: r
> Just to make sure I understand: does this test create enough updates/deltas
Yep, there are 100 rowsets, and therefore 100 DMSs to flush. Before, we would wait 1 second between scheduling each one, so with a single thread, that would take 100s.



-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 14 Oct 2020 00:00:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-3195: flush when any DMS in the tablet is older than the time threshold

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/16581 )

Change subject: KUDU-3195: flush when any DMS in the tablet is older than the time threshold
......................................................................


Patch Set 6: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet.cc
File src/kudu/tablet/tablet.cc:

http://gerrit.cloudera.org:8080/#/c/16581/5/src/kudu/tablet/tablet.cc@2234
PS5, Line 2234:     // To facilitate memory-based flushing when under memory pressure, we
              :     // define a score that's part memory and part WAL retention bytes.
              :     double score = dms_size_bytes * mem_weight + replay_size_bytes * (100 - mem_weight);
> It is not reused anywhere right now, and I actually question its value. I'm
SGTM



-- 
To view, visit http://gerrit.cloudera.org:8080/16581
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id05202bf6a4685f4d79db11ef8ebb0f91f6316b4
Gerrit-Change-Number: 16581
Gerrit-PatchSet: 6
Gerrit-Owner: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Grant Henke <gr...@apache.org>
Gerrit-Reviewer: Kudu Jenkins (120)
Gerrit-Reviewer: Tidy Bot (241)
Gerrit-Comment-Date: Wed, 14 Oct 2020 01:46:26 +0000
Gerrit-HasComments: Yes