You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2018/03/30 04:04:07 UTC

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Hello Mike Percy,

I'd like you to do a code review. Please visit

    http://gerrit.cloudera.org:8080/9867

to review the following change.


Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................

KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

This test was flaky in TSAN builds for two reasons:

(1) even though taking the stack trace is always pretty fast, symbolization can
sometimes take more than a second. If that was the case, the assertion that
looked at the diagnostics log content would fail just because the expected log
was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test
failure rate with 4 stress threads down from about 60% to about 0.3%.

(2) with periodic stack dumping enabled, it was possible that the periodic dump
was already in progress at the time that the overflow-triggered dump was
triggered. In that case, the overflow-triggered dump would be ignored. That
was expected behavior, but the test would fail due to not seeing the overflow
listed as the "reason". This patch fixes this issue by disabling periodic
dumping in the test.

In order to disable periodic dumping but leave triggered dumping enabled,
I changed the semantics of the configuration slightly. Previously, setting
it to '0' would disable all dumping. Now, setting it to '0' disables
periodic dumping but still leaves triggered dumps enabled. Setting it to
'-1' disables all dumping.

With both these changes I was able to loop the test 300 times with 4 stress
threads in TSAN mode with no failures.

Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
---
M src/kudu/master/master-test.cc
M src/kudu/server/diagnostics_log.cc
2 files changed, 26 insertions(+), 10 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/9867/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106
PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms);
Nit: alphabetical order (at least switch with master_inject_latency_on_tablet_lookups_ms).



-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 15:02:09 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................


Patch Set 2: Verified+1


-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 21:06:19 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................

KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

This test was flaky in TSAN builds for two reasons:

(1) even though taking the stack trace is always pretty fast, symbolization can
sometimes take more than a second. If that was the case, the assertion that
looked at the diagnostics log content would fail just because the expected log
was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test
failure rate with 4 stress threads down from about 60% to about 0.3%.

(2) with periodic stack dumping enabled, it was possible that the periodic dump
was already in progress at the time that the overflow-triggered dump was
triggered. In that case, the overflow-triggered dump would be ignored. That
was expected behavior, but the test would fail due to not seeing the overflow
listed as the "reason". This patch fixes this issue by disabling periodic
dumping in the test.

In order to disable periodic dumping but leave triggered dumping enabled,
I changed the semantics of the configuration slightly. Previously, setting
it to '0' would disable all dumping. Now, setting it to '0' disables
periodic dumping but still leaves triggered dumps enabled. Setting it to
'-1' disables all dumping.

With both these changes I was able to loop the test 300 times with 4 stress
threads in TSAN mode with no failures.

Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Reviewed-on: http://gerrit.cloudera.org:8080/9867
Reviewed-by: Adar Dembo <ad...@cloudera.com>
Tested-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/master/master-test.cc
M src/kudu/server/diagnostics_log.cc
2 files changed, 26 insertions(+), 10 deletions(-)

Approvals:
  Adar Dembo: Looks good to me, approved
  Todd Lipcon: Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Mike Percy, Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9867

to look at the new patch set (#2).

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................

KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

This test was flaky in TSAN builds for two reasons:

(1) even though taking the stack trace is always pretty fast, symbolization can
sometimes take more than a second. If that was the case, the assertion that
looked at the diagnostics log content would fail just because the expected log
was not there _yet_. Fixing this issue using ASSERT_EVENTUALLY brought the test
failure rate with 4 stress threads down from about 60% to about 0.3%.

(2) with periodic stack dumping enabled, it was possible that the periodic dump
was already in progress at the time that the overflow-triggered dump was
triggered. In that case, the overflow-triggered dump would be ignored. That
was expected behavior, but the test would fail due to not seeing the overflow
listed as the "reason". This patch fixes this issue by disabling periodic
dumping in the test.

In order to disable periodic dumping but leave triggered dumping enabled,
I changed the semantics of the configuration slightly. Previously, setting
it to '0' would disable all dumping. Now, setting it to '0' disables
periodic dumping but still leaves triggered dumps enabled. Setting it to
'-1' disables all dumping.

With both these changes I was able to loop the test 300 times with 4 stress
threads in TSAN mode with no failures.

Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
---
M src/kudu/master/master-test.cc
M src/kudu/server/diagnostics_log.cc
2 files changed, 26 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/67/9867/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc
File src/kudu/master/master-test.cc:

http://gerrit.cloudera.org:8080/#/c/9867/1/src/kudu/master/master-test.cc@106
PS1, Line 106: DECLARE_int32(diagnostics_log_stack_traces_interval_ms);
> Nit: alphabetical order (at least switch with master_inject_latency_on_tabl
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:09:42 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has removed a vote on this change.

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................


Removed Verified-1 by Kudu Jenkins (120)
-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9867 )

Change subject: KUDU-2361. Fix flaky MasterTest.TestDumpStacksOnRpcQueueOverflow
......................................................................


Patch Set 2: Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If7078bc87d2cf724d5f2fddf1173ac1786279e7e
Gerrit-Change-Number: 9867
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <mp...@apache.org>
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:26:19 +0000
Gerrit-HasComments: No