You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2018/02/14 03:22:13 UTC

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Hello Will Berkeley, Mike Percy,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/9318

to review the following change.


Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 276 insertions(+), 109 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................


Patch Set 6:

(1 comment)

Looks good, there were a couple comments missed in rev 5

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@219
PS4, Line 219: if (!sig) {
> Because we're using the signal data as a sort of queue to move the 'info' p
Oops, I figured this out and forgot to delete the comment. Thanks for the explanation though.



-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 08:00:12 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9318

to look at the new patch set (#4).

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 309 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/4
-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9318

to look at the new patch set (#2).

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 306 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc@213
PS5, Line 213: sig
> nit: name this sig_data for consistency when tracing both sides of the comm
Done


http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc@350
PS5, Line 350:     DLOG(WARNING) << "Leaking SignalData structure " << sig_data_ << " after lost signal "
> I think we could at least make the signalled thread try to delete the sig_d
Deleting inside the signal handler is not allowed since free() is not an async-safe function.

I was actually hoping to eventually change this to leak into a freelist so the "leaked" data blocks can be reused by queueing them to another tid. Will add a TODO for that.



-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 18:49:30 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Reviewed-on: http://gerrit.cloudera.org:8080/9318
Reviewed-by: Mike Percy <mp...@apache.org>
Tested-by: Kudu Jenkins
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 317 insertions(+), 109 deletions(-)

Approvals:
  Mike Percy: Looks good to me, approved
  Kudu Jenkins: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 8
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9318

to look at the new patch set (#6).

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 317 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/6
-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 6
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................


Patch Set 5:

(12 comments)

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.h
File src/kudu/util/debug-util.h:

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.h@208
PS4, Line 208: Trigger
TriggerAsync


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@141
PS4, Line 141:   void Signal() {
add: // Wakes all waiters, notifying them of completion.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@151
PS4, Line 151:   bool WaitUntil(MonoTime deadline) {
add: // Returns false if reached deadline before completion, true otherwise.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@216
PS4, Line 216: maybe
nit: capitalize


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@219
PS4, Line 219: ANNOTATE_HAPPENS_AFTER(sig);
why is this needed?


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@230
PS4, Line 230: 2
nit: /*skip_frames=*/


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@312
PS4, Line 312: }
nit: move up one line


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@395
PS4, Line 395: SYS_rt_tgsigqueueinfo
Did you look into using pthread_sigqueue() ? I suppose a minor benefit of this approach is that we can use the native tid?


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@396
PS4, Line 396: )
, ErrnoToString(errno), errno) ?


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@427
PS4, Line 427: Status StackTraceCollector::TriggerAsync(int64_t tid_, StackTrace* stack) {
missing void StackTraceCollector::RevokeSigData() for non-linux


http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc@213
PS5, Line 213: sig
nit: name this sig_data for consistency when tracing both sides of the communication logic.


http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc@350
PS5, Line 350:     DLOG(WARNING) << "Leaking SignalData structure " << sig_data_ << " after lost signal "
I think we could at least make the signalled thread try to delete the sig_data structure if the caller has given up. I haven't tested this but I think we could use something like:

  -  if (!sig->queued_to_tid.compare_exchange_strong(my_tid, SignalData::kDumpInProgress)) {
  +  int64_t old_val = sig->queued_to_tid.exchange(SignalData::kDumpInProgress);
  +  if (old_val != my_tid) {
  +    delete sig;
       return;
     }



-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 5
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 03:06:10 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9318

to look at the new patch set (#3).

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 308 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................


Patch Set 4:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.h
File src/kudu/util/debug-util.h:

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.h@208
PS4, Line 208: Trigger
> TriggerAsync
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@141
PS4, Line 141:   void Signal() {
> add: // Wakes all waiters, notifying them of completion.
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@151
PS4, Line 151:   bool WaitUntil(MonoTime deadline) {
> add: // Returns false if reached deadline before completion, true otherwise
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@216
PS4, Line 216: maybe
> nit: capitalize
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@219
PS4, Line 219: ANNOTATE_HAPPENS_AFTER(sig);
> why is this needed?
Because we're using the signal data as a sort of queue to move the 'info' pointer between two threads, we need this to ensure that TSAN sees there is an enforced ordering between the two. Normally we get this ordering implied when thread A acquires a mutex or spinlock previously released by thread B, but in this case there is no such mutex.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@230
PS4, Line 230: 2
> nit: /*skip_frames=*/
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@312
PS4, Line 312: }
> nit: move up one line
Done


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@395
PS4, Line 395: SYS_rt_tgsigqueueinfo
> Did you look into using pthread_sigqueue() ? I suppose a minor benefit of t
yea, the problem is that for pthread_sigqueue we need to have pthread_t for all the threads, and best I can tell there is no way to go from a tid to a pthread_t. Additionally in the past I looked into how to list all running threads using pthreads APIs and came up with nothing particularly nice. There's some 'thread_db' API but it seems meant for debuggers and not easy to use at all.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@396
PS4, Line 396: )
> , ErrnoToString(errno), errno) ?
this ends up printed in the /stacks page and stuff, so I think it's better to just have the nice message here.


http://gerrit.cloudera.org:8080/#/c/9318/4/src/kudu/util/debug-util.cc@427
PS4, Line 427: Status StackTraceCollector::TriggerAsync(int64_t tid_, StackTrace* stack) {
> missing void StackTraceCollector::RevokeSigData() for non-linux
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 07:13:38 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Mike Percy (Code Review)" <ge...@cloudera.org>.
Mike Percy has posted comments on this change. ( http://gerrit.cloudera.org:8080/9318 )

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................


Patch Set 7: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc
File src/kudu/util/debug-util.cc:

http://gerrit.cloudera.org:8080/#/c/9318/5/src/kudu/util/debug-util.cc@350
PS5, Line 350:   // global free-list, and then reuse them the next time we want to send a
> Deleting inside the signal handler is not allowed since free() is not an as
Ah, right. Ok looks good.



-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>
Gerrit-Comment-Date: Wed, 21 Feb 2018 19:23:22 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2291 (part 5): allow collecting stack traces asynchronously

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Will Berkeley, Tidy Bot, Mike Percy, Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9318

to look at the new patch set (#7).

Change subject: KUDU-2291 (part 5): allow collecting stack traces asynchronously
......................................................................

KUDU-2291 (part 5): allow collecting stack traces asynchronously

This removes the global variable previously used to communicate between
a stack trace collector and the stack trace target. The global variable
had the downside of allowing only a single stack trace operation in
flight at a time, which means that for use cases where we want to
capture the trace of a bunch of threads, we needed to do so serially.

For use cases like process-wide traces this could make debugging a bit
tricky. For example, because of the skew in time between collecting
the stack of thread A and the stack of thread B, it's possible that
the traces could show them holding the same lock, which we know to be
impossible. Getting the collection time as close as possible to
"simultaneous" can reduce the possibility of these kind of
strange-looking results.

The actual mechanism to do this is slightly tricky. Rather than
duplicate the full description here, check the comments in the
implementation. In particular, StackTraceCollector::RevokeSigData()
explains the complexity of not knowing whether our signal will ever be
delivered.

A new unit test shows the pattern, and this also updates the stack
servlet to send the signals to all of the threads as close as possible
to "the same time".

Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
---
M src/kudu/server/default_path_handlers.cc
M src/kudu/util/debug-util-test.cc
M src/kudu/util/debug-util.cc
M src/kudu/util/debug-util.h
4 files changed, 317 insertions(+), 109 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/18/9318/7
-- 
To view, visit http://gerrit.cloudera.org:8080/9318
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ie4d7789bb1272db033df4685f46eaf8483ec7be1
Gerrit-Change-Number: 9318
Gerrit-PatchSet: 7
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Tidy Bot
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Will Berkeley <wd...@gmail.com>