You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Will Berkeley (Code Review)" <ge...@cloudera.org> on 2018/03/08 01:13:03 UTC

[kudu-CR] [WIP] KUDU-2289: Tablet deletion should be throttled

Will Berkeley has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9551


Change subject: [WIP] KUDU-2289: Tablet deletion should be throttled
......................................................................

[WIP] KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
reuests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet was run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

WIP because I'm still doing some tests, and I haven't thought
enough about whether the pool's queue size should be limited.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
4 files changed, 131 insertions(+), 36 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 1
Gerrit-Owner: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9551

to look at the new patch set (#7).

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................

KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
requests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet is run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/integration-tests/delete_table-itest.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
5 files changed, 148 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/7
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9551

to look at the new patch set (#6).

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................

KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
requests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet is run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/integration-tests/delete_table-itest.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
5 files changed, 204 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/6
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 8: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 19:05:59 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 3:

(5 comments)

Could you add a test for this? I'm confident that it hasn't regressed any functionality, but it'd be good for there to be a test that the throttling actually works as you'd expect.

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc@807
PS3, Line 807: context, req, resp
> I don't understand how it's guaranteed that these parameters are not yet fr
I think they're guaranteed to live until the call to context->RespondSuccess().


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.h@142
PS3, Line 142:   Status DeleteTablet(const std::string& tablet_id,
Could we make the synchronous version private? AFAICT it's now only used in an itest; perhaps it can use a Synchronizer to produce a cb that can be waited on?


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@278
PS3, Line 278:   virtual void DisableCallback() {
This doesn't need to be virtual, does it?


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@840
PS3, Line 840:       new DeleteTabletRunnable(this, tablet_id, delete_type, cas_config_index, cb));
Nit: use std::make_shared to make the runnable.


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@848
PS3, Line 848:   if (s.IsServiceUnavailable()) {
             :     cb(s, TabletServerErrorPB::THROTTLED);
             :     return;
             :   }
             :   cb(s, TabletServerErrorPB::UNKNOWN_ERROR);
Simpler as:

  cb(s, s.IsServiceUnavailable() ? THROTTLED : UNKNOWN_ERROR);



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 16 Mar 2018 23:26:48 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1191
PS5, Line 1191: 
              :   // Despite sending more DeleteTablet requests than (service queue length + num service threads),
              :   // the master should never have had to retry a DeleteTablet request.
> All good points. Perhaps bug Mike or Todd for ideas? If no one can suggest 
yea, I'm not convinced this needs a separate unit test since it's sort of "correct by construction" given that we've separately tested thread pools, etc.



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 30 Mar 2018 01:27:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc@807
PS3, Line 807: context, req, resp
I don't understand how it's guaranteed that these parameters are not yet freed at the time the the callback is called upon successful deletion of the tablet.  Could you please clarify on that?



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 16 Mar 2018 05:35:08 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 3:

(1 comment)

> > Do you have any advice on how to test this? I'm not setting a
 > queue length, the idea being that DeleteTablets can wait in the
 > queue instead of making the master constantly retry them, so the
 > DeleteTablet RPCs will no be throttled. I can reproduce symptoms of
 > the original problem in CreateTableStressTest.CreateAndDeleteBigTable
 > by turning up the number of tablets in the big table-- there'll be
 > service-queue-full rejections of CreateTablet and DeleteTablet
 > calls in the logs-- and those go away for DeleteTablets when this
 > change is applied. DeleteTablet is an admin service RPC, so I
 > can't, for example, Ping the service to see if the queue is full.
 > 
 > Hmm, good question. You could install metrics on the new thread
 > pool (see ThreadPoolMetrics) and look at the amount of time each
 > task spends queued; before your change it'd be small (just waiting
 > for a thread to start), and after it'd be large (need to wait for
 > prior delete tablets to finish). But that's a little bit brittle.

Maybe, you could set --num_tablets_to_delete_simultaneously=1, run with a single tablet server, inject a delay into TabletServiceAdminImpl::DeleteTable() or around, create a table with multiple tablets, and then remove it, capturing the readings of the METRIC_handler_latency_kudu_tserver_TabletServerAdminService_DeleteTablet (e.g., see the DeleteTableITest.TestUnknownTabletsAreNotDeleted scenario).  The idea is to make sure the metrics readings go up until reaching the total number of tablets, and that happens not faster than it should.  The same might be done for --num_tablets_to_delete_simultaneously=2, etc.

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc@807
PS3, Line 807: context, req, resp
> Thanks Adar. Here's a relevant snippet from the comment on RpcContext::Resp
Thank you for the explanation.



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 20 Mar 2018 18:41:36 +0000
Gerrit-HasComments: Yes

[kudu-CR] [WIP] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9551

to look at the new patch set (#2).

Change subject: [WIP] KUDU-2289: Tablet deletion should be throttled
......................................................................

[WIP] KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
reuests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet was run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

WIP because I'm still doing some tests, and I haven't thought
enough about whether the pool's queue size should be limited.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
4 files changed, 133 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 2
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 7:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9551/6/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/6/src/kudu/integration-tests/delete_table-itest.cc@1153
PS6, Line 1153: 
> Still a bad test.
Removed.


http://gerrit.cloudera.org:8080/#/c/9551/7/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/7/src/kudu/integration-tests/delete_table-itest.cc@117
PS7, Line 117: METRIC_DECLARE_counter(rpcs_queue_overflow);
> Not needed anymore?
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 18:57:58 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 4:

Thanks, Adar and Alexey. I think I was able to make a test that shouldn't be too brittle. Let me know what you think.


-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 22 Mar 2018 03:48:23 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9551/7/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/7/src/kudu/integration-tests/delete_table-itest.cc@117
PS7, Line 117: METRIC_DECLARE_counter(rpcs_queue_overflow);
Not needed anymore?



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 7
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 05 Apr 2018 18:37:23 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 3:

(6 comments)

Do you have any advice on how to test this? I'm not setting a queue length, the idea being that DeleteTablets can wait in the queue instead of making the master constantly retry them, so the DeleteTablet RPCs will no be throttled. I can reproduce symptoms of the original problem in CreateTableStressTest.CreateAndDeleteBigTable by turning up the number of tablets in the big table-- there'll be service-queue-full rejections of CreateTablet and DeleteTablet calls in the logs-- and those go away for DeleteTablets when this change is applied. DeleteTablet is an admin service RPC, so I can't, for example, Ping the service to see if the queue is full.

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc
File src/kudu/tserver/tablet_service.cc:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/tablet_service.cc@807
PS3, Line 807: context, req, resp
> I think they're guaranteed to live until the call to context->RespondSucces
Thanks Adar. Here's a relevant snippet from the comment on RpcContext::RespondSuccess() in rpc_context.h:

 // After this method returns, this RpcContext object is destroyed. The request
 // and response protobufs are also destroyed.

Likewise, the context, request, and response objects are also destroyed when calling the other RpcContext::Respond* methods.


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.h
File src/kudu/tserver/ts_tablet_manager.h:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.h@142
PS3, Line 142:   Status DeleteTablet(const std::string& tablet_id,
> Could we make the synchronous version private? AFAICT it's now only used in
DeleteTabletRunnable calls DeleteTablet. I could make DeleteTabletRunnable a friend of TSTabletManager and make DeleteTablet private, but that feels like overkill because DeleteTabletRunnable just needs access to one method: a synchronous DeleteTablet().


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@278
PS3, Line 278:   virtual void DisableCallback() {
> This doesn't need to be virtual, does it?
Done


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@321
PS3, Line 321: max_open_threads
> This should be max_delete_threads.
Done


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@840
PS3, Line 840:       new DeleteTabletRunnable(this, tablet_id, delete_type, cas_config_index, cb));
> Nit: use std::make_shared to make the runnable.
Done


http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@848
PS3, Line 848:   if (s.IsServiceUnavailable()) {
             :     cb(s, TabletServerErrorPB::THROTTLED);
             :     return;
             :   }
             :   cb(s, TabletServerErrorPB::UNKNOWN_ERROR);
> Simpler as:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 19 Mar 2018 19:00:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9551

to look at the new patch set (#3).

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................

KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
requests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet is run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
4 files changed, 133 insertions(+), 36 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc
File src/kudu/tserver/ts_tablet_manager.cc:

http://gerrit.cloudera.org:8080/#/c/9551/3/src/kudu/tserver/ts_tablet_manager.cc@321
PS3, Line 321: max_open_threads
This should be max_delete_threads.



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 16 Mar 2018 00:41:51 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 4:

> Do you have any advice on how to test this? I'm not setting a queue length, the idea being that DeleteTablets can wait in the queue instead of making the master constantly retry them, so the DeleteTablet RPCs will no be throttled. I can reproduce symptoms of the original problem in CreateTableStressTest.CreateAndDeleteBigTable by turning up the number of tablets in the big table-- there'll be service-queue-full rejections of CreateTablet and DeleteTablet calls in the logs-- and those go away for DeleteTablets when this change is applied. DeleteTablet is an admin service RPC, so I can't, for example, Ping the service to see if the queue is full.

Hmm, good question. You could install metrics on the new thread pool (see ThreadPoolMetrics) and look at the amount of time each task spends queued; before your change it'd be small (just waiting for a thread to start), and after it'd be large (need to wait for prior delete tablets to finish). But that's a little bit brittle.


-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 19 Mar 2018 23:22:26 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Alexey Serbin (Code Review)" <ge...@cloudera.org>.
Alexey Serbin has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 3:

> (1 comment)
 > 
 > > > Do you have any advice on how to test this? I'm not setting a
 > > queue length, the idea being that DeleteTablets can wait in the
 > > queue instead of making the master constantly retry them, so the
 > > DeleteTablet RPCs will no be throttled. I can reproduce symptoms
 > of
 > > the original problem in CreateTableStressTest.CreateAndDeleteBigTable
 > > by turning up the number of tablets in the big table-- there'll
 > be
 > > service-queue-full rejections of CreateTablet and DeleteTablet
 > > calls in the logs-- and those go away for DeleteTablets when this
 > > change is applied. DeleteTablet is an admin service RPC, so I
 > > can't, for example, Ping the service to see if the queue is full.
 > >
 > > Hmm, good question. You could install metrics on the new thread
 > > pool (see ThreadPoolMetrics) and look at the amount of time each
 > > task spends queued; before your change it'd be small (just
 > waiting
 > > for a thread to start), and after it'd be large (need to wait for
 > > prior delete tablets to finish). But that's a little bit brittle.
 > 
 > Maybe, you could set --num_tablets_to_delete_simultaneously=1, run
 > with a single tablet server, inject a delay into TabletServiceAdminImpl::DeleteTable()
 > or around, create a table with multiple tablets, and then remove
 > it, capturing the readings of the METRIC_handler_latency_kudu_tserver_TabletServerAdminService_DeleteTablet
 > (e.g., see the DeleteTableITest.TestUnknownTabletsAreNotDeleted
 > scenario).  The idea is to make sure the metrics readings go up
 > until reaching the total number of tablets, and that happens not
 > faster than it should.  The same might be done for
 > --num_tablets_to_delete_simultaneously=2, etc.

Oh, I meant '... inject a delay into  TabletServiceAdminImpl::DeleteTablet() or around ...'


-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Tue, 20 Mar 2018 18:42:53 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1191
PS5, Line 1191:   ASSERT_EVENTUALLY([&]() {
              :       TServerDetails* ts = ts_map_[cluster_->tablet_server(0)->uuid()];
              :       vector<string> tablets;
> I looked at this more and it's a bad test. I'm still struggling to figure o
All good points. Perhaps bug Mike or Todd for ideas? If no one can suggest a good alternative, you could drop the test altogether.



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 23 Mar 2018 18:15:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, Todd Lipcon, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9551

to look at the new patch set (#8).

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................

KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
requests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet is run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
4 files changed, 147 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/8
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 8
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 6:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1179
PS5, Line 1179:   
> This is the default and can be omitted.
Done


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1182
PS5, Line 1182:   int64_t num_queue_overflow_from_create;
> Is this actually necessary? Won't the Create() call wait for a leader to be
I didn't realize it worked that way. Done.


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1184
PS5, Line 1184: _http_ho
> Nit: there's only one replica per tablet though, so here "tablet" might be 
Done


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1186
PS5, Line 1186:       &METRIC_rpcs_queue_overflow,
> Nit: could be scoped inside the ASSERT_EVENTUALLY.
Done


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1191
PS5, Line 1191:   ASSERT_EVENTUALLY([&]() {
              :       TServerDetails* ts = ts_map_[cluster_->tablet_server(0)->uuid()];
              :       vector<string> tablets;
> Interesting approach. Could you run the following?
I looked at this more and it's a bad test. I'm still struggling to figure out how to test this. There's supposed to be two benefits:

1. Preventing DeleteTablets from causing service queue overflows by using service threads for I/O.
2. Controlling the number of simultaneous DeleteTablets going on at once, since they do I/O.

1 is very hard to test, since it's not reliable to make a test that doesn't overflow the service queue with the pool, and that does overflow it pre-change, since the service queue can overflow anytime enough RPCs are fired in quick succession; it just requires a service thread not to get scheduled to pick up something off the queue while e.g. the master thread shooting off DeleteTablets keeps running.  I see this happen pretty frequently even just in debug builds since the master rapid-fires the DeleteTablets once the table is deleted.

2 feels a little silly to test to me, since it follows from how the threadpool works when the max # of threads is limited, so I'm not sure it means much to try and test it separately.


http://gerrit.cloudera.org:8080/#/c/9551/6/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/6/src/kudu/integration-tests/delete_table-itest.cc@1153
PS6, Line 1153: TEST_F(DeleteTableITest, TestDeleteTabletDoesNotMonopolizeServiceThreads)
Still a bad test.



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 6
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Fri, 23 Mar 2018 05:03:28 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................

KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
requests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet is run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Reviewed-on: http://gerrit.cloudera.org:8080/9551
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Kudu Jenkins
---
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
4 files changed, 147 insertions(+), 44 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 9
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9551

to look at the new patch set (#5).

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................

KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
requests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet is run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/integration-tests/delete_table-itest.cc
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
5 files changed, 198 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/5
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Hello Alexey Serbin, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9551

to look at the new patch set (#4).

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................

KUDU-2289: Tablet deletion should be throttled

When a table is deleted, the master eagerly sends DeleteTablet
requests for every replica of every tablet. Since DeleteTablet
can be IO-heavy and DeleteTablet is run by service threads,
deleting tables could harm other concurrent workloads.

This changes DeleteTablet to run on a threadpool. The number
of threads is capped by --num_tablets_to_delete_simultaneously,
which default to the number of data dirs, a proxy for the
number of disks. This should help throttle tablet deletions,
both preventing them from monopolizing service threads and
limiting their IO.

Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
---
M src/kudu/tserver/tablet_service.cc
M src/kudu/tserver/ts_tablet_manager.cc
M src/kudu/tserver/ts_tablet_manager.h
M src/kudu/util/threadpool.h
4 files changed, 137 insertions(+), 44 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/51/9551/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 4
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Will Berkeley (Code Review)" <ge...@cloudera.org>.
Will Berkeley has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 3:

Failure is KUDU-2236.


-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 3
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Mon, 12 Mar 2018 20:06:01 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2289: Tablet deletion should be throttled

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9551 )

Change subject: KUDU-2289: Tablet deletion should be throttled
......................................................................


Patch Set 5:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc
File src/kudu/integration-tests/delete_table-itest.cc:

http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1179
PS5, Line 1179:       .wait(true)
This is the default and can be omitted.


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1182
PS5, Line 1182:   ASSERT_OK(WaitForNumTabletsOnTS(ts, kNumTablets, MonoDelta::FromSeconds(30), nullptr));
Is this actually necessary? Won't the Create() call wait for a leader to be elected for every tablet, and since there's just one tserver, that means every tablet will be bootstrapped?


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1184
PS5, Line 1184: replicas
Nit: there's only one replica per tablet though, so here "tablet" might be more appropriate.


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1186
PS5, Line 1186:   vector<string> tablets;
Nit: could be scoped inside the ASSERT_EVENTUALLY.


http://gerrit.cloudera.org:8080/#/c/9551/5/src/kudu/integration-tests/delete_table-itest.cc@1191
PS5, Line 1191: 
              :   // Despite sending more DeleteTablet requests than (service queue length + num service threads),
              :   // the master should never have had to retry a DeleteTablet request.
Interesting approach. Could you run the following?
1. Without the patch, does this new test fail?
2. Could you ensure the new test is robust by looping it in dist-test a thousand times, preferably in TSAN mode and/or with some stress threads?



-- 
To view, visit http://gerrit.cloudera.org:8080/9551
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3819bf8a3acf8ea03a76cc6cacd92d85bb114998
Gerrit-Change-Number: 9551
Gerrit-PatchSet: 5
Gerrit-Owner: Will Berkeley <wd...@gmail.com>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Alexey Serbin <as...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Thu, 22 Mar 2018 16:06:08 +0000
Gerrit-HasComments: Yes