You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2017/11/13 22:25:51 UTC

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Hello Andrew Wong,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/8536

to review the following change.


Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................

KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit

This changes the stack watchdog so that thread unregistration no longer
blocks if the watchdog thread is in the middle of dumping a stack.

This is to try to avoid cases where a user thread is waiting to join on
another thread, but that thread is blocked due to watchdog interference.

A new stress-test/benchmark verifies the improvement. It simulates slow
stack trace collection by injecting latency into the watchdog thread,
and then starts and joins threads in a loop for 5 seconds. Without the
fix, it was only able to start about 1000 threads/second, whereas with
the fix it's able to start 10,000 threads/second.

Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
---
M src/kudu/util/fault_injection.cc
M src/kudu/util/fault_injection.h
M src/kudu/util/kernel_stack_watchdog.cc
M src/kudu/util/kernel_stack_watchdog.h
M src/kudu/util/stack_watchdog-test.cc
5 files changed, 139 insertions(+), 17 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/8536/1
-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/8536 )

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.h
File src/kudu/util/kernel_stack_watchdog.h:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.h@176
PS1, Line 176: // Installs a callback to automatically un
> nit: maybe doc here that this is used internally by CreateTLS to create thr
Done


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.cc
File src/kudu/util/kernel_stack_watchdog.cc:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.cc@151
PS1, Line 151:  vector<unique_ptr<TLS>> to_delete;
             :     {
             :       lock_guard<simple_spinlock> l(tls_lock_);
             :       to_delete.swap(pending_delete_);
             :       tls_map_copy = tls_by_tid_;
             :     }
             :     // Actually delete
> Is it important to document somewhere that the pending TLS instances only g
Done


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc
File src/kudu/util/stack_watchdog-test.cc:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc@139
PS1, Line 139: e
> nit: could replace with `started % threads.size()`? One fewer variable to t
Done


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc@139
PS1, Line 139: &]() {
> nit: drop std::
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Wed, 15 Nov 2017 20:22:14 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8536

to look at the new patch set (#3).

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................

KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit

This changes the stack watchdog so that thread unregistration no longer
blocks if the watchdog thread is in the middle of dumping a stack.

This is to try to avoid cases where a user thread is waiting to join on
another thread, but that thread is blocked due to watchdog interference.

A new stress-test/benchmark verifies the improvement. It simulates slow
stack trace collection by injecting latency into the watchdog thread,
and then starts and joins threads in a loop for 5 seconds. Without the
fix, it was only able to start about 1000 threads/second, whereas with
the fix it's able to start 10,000 threads/second.

Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
---
M src/kudu/util/fault_injection.cc
M src/kudu/util/fault_injection.h
M src/kudu/util/kernel_stack_watchdog.cc
M src/kudu/util/kernel_stack_watchdog.h
M src/kudu/util/stack_watchdog-test.cc
5 files changed, 148 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/8536/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8536 )

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................


Patch Set 4: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Sat, 18 Nov 2017 04:27:00 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/8536 )

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................

KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit

This changes the stack watchdog so that thread unregistration no longer
blocks if the watchdog thread is in the middle of dumping a stack.

This is to try to avoid cases where a user thread is waiting to join on
another thread, but that thread is blocked due to watchdog interference.

A new stress-test/benchmark verifies the improvement. It simulates slow
stack trace collection by injecting latency into the watchdog thread,
and then starts and joins threads in a loop for 5 seconds. Without the
fix, it was only able to start about 1000 threads/second, whereas with
the fix it's able to start 10,000 threads/second.

Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Reviewed-on: http://gerrit.cloudera.org:8080/8536
Tested-by: Kudu Jenkins
Reviewed-by: Andrew Wong <aw...@cloudera.com>
---
M src/kudu/util/fault_injection.cc
M src/kudu/util/fault_injection.h
M src/kudu/util/kernel_stack_watchdog.cc
M src/kudu/util/kernel_stack_watchdog.h
M src/kudu/util/stack_watchdog-test.cc
5 files changed, 149 insertions(+), 21 deletions(-)

Approvals:
  Kudu Jenkins: Verified
  Andrew Wong: Looks good to me, approved

-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8536

to look at the new patch set (#2).

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................

KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit

This changes the stack watchdog so that thread unregistration no longer
blocks if the watchdog thread is in the middle of dumping a stack.

This is to try to avoid cases where a user thread is waiting to join on
another thread, but that thread is blocked due to watchdog interference.

A new stress-test/benchmark verifies the improvement. It simulates slow
stack trace collection by injecting latency into the watchdog thread,
and then starts and joins threads in a loop for 5 seconds. Without the
fix, it was only able to start about 1000 threads/second, whereas with
the fix it's able to start 10,000 threads/second.

Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
---
M src/kudu/util/fault_injection.cc
M src/kudu/util/fault_injection.h
M src/kudu/util/kernel_stack_watchdog.cc
M src/kudu/util/kernel_stack_watchdog.h
M src/kudu/util/stack_watchdog-test.cc
5 files changed, 145 insertions(+), 18 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/8536/2
-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Posted by "Andrew Wong (Code Review)" <ge...@cloudera.org>.
Andrew Wong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8536 )

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................


Patch Set 1:

(4 comments)

Injection looks good, mostly nits here

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.h
File src/kudu/util/kernel_stack_watchdog.h:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.h@176
PS1, Line 176: static void ThreadExiting(void* tls_void);
nit: maybe doc here that this is used internally by CreateTLS to create thread-local destructors?


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.cc
File src/kudu/util/kernel_stack_watchdog.cc:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/kernel_stack_watchdog.cc@151
PS1, Line 151:  vector<unique_ptr<TLS>> to_delete;
             :     {
             :       lock_guard<simple_spinlock> l(tls_lock_);
             :       to_delete.swap(pending_delete_);
             :       tls_map_copy = tls_by_tid_;
             :     }
             :     to_delete.clear();
Is it important to document somewhere that the pending TLS instances only get d'ted in RunThread? Or is that more of an implementation detail.


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc
File src/kudu/util/stack_watchdog-test.cc:

http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc@139
PS1, Line 139: std::t
nit: drop std::


http://gerrit.cloudera.org:8080/#/c/8536/1/src/kudu/util/stack_watchdog-test.cc@139
PS1, Line 139: i
nit: could replace with `started % threads.size()`? One fewer variable to think about, as trivial as it is



-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Comment-Date: Mon, 13 Nov 2017 23:45:13 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2215. kernel stack watchdog: avoid blocking thread exit

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Tidy Bot, Andrew Wong, Kudu Jenkins, Andrew Wong, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/8536

to look at the new patch set (#4).

Change subject: KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit
......................................................................

KUDU-2215. kernel_stack_watchdog: avoid blocking thread exit

This changes the stack watchdog so that thread unregistration no longer
blocks if the watchdog thread is in the middle of dumping a stack.

This is to try to avoid cases where a user thread is waiting to join on
another thread, but that thread is blocked due to watchdog interference.

A new stress-test/benchmark verifies the improvement. It simulates slow
stack trace collection by injecting latency into the watchdog thread,
and then starts and joins threads in a loop for 5 seconds. Without the
fix, it was only able to start about 1000 threads/second, whereas with
the fix it's able to start 10,000 threads/second.

Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
---
M src/kudu/util/fault_injection.cc
M src/kudu/util/fault_injection.h
M src/kudu/util/kernel_stack_watchdog.cc
M src/kudu/util/kernel_stack_watchdog.h
M src/kudu/util/stack_watchdog-test.cc
5 files changed, 149 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/36/8536/4
-- 
To view, visit http://gerrit.cloudera.org:8080/8536
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib6a349666e8484c00b2f43c5918205ec1a4c09ab
Gerrit-Change-Number: 8536
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Andrew Wong <an...@cloudera.com>
Gerrit-Reviewer: Andrew Wong <aw...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>