You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@kudu.apache.org by "Todd Lipcon (Code Review)" <ge...@cloudera.org> on 2018/03/30 02:43:30 UTC

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Todd Lipcon has uploaded this change for review. ( http://gerrit.cloudera.org:8080/9864


Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................

KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

This test was flaky with two different failure modes which I believe
have the same root cause: when the test starts, it's possible that there
is still some thread from a prior test case in the process of shutting
down. Even though we join() on our threads religiously, there is a very
short period of time after join() returns where the dying thread is
still listed in /proc/self/task/. This meant that we would sometimes
fail the assertion that checks that the number of returned stacks
matches the number of running threads (because the thread finished
exitinng just before we called SnapshotAllStacks). In other cases we
would still see that exiting thread, but it would exit before responding
to our stack trace signal, so we'd see a "timed out" response.

The fix is a little hacky - we calculate the expected number of running
threads and ASSERT_EVENTUALLY to wait until reality matches
expectations.

Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
---
M src/kudu/util/debug-util-test.cc
1 file changed, 32 insertions(+), 6 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/9864/1
-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Posted by "Adar Dembo (Code Review)" <ge...@cloudera.org>.
Adar Dembo has posted comments on this change. ( http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG@17
PS2, Line 17: exitinng
exiting



-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 14:42:26 +0000
Gerrit-HasComments: Yes

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, Adar Dembo, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9864

to look at the new patch set (#3).

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................

KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

This test was flaky with two different failure modes which I believe
have the same root cause: when the test starts, it's possible that there
is still some thread from a prior test case in the process of shutting
down. Even though we join() on our threads religiously, there is a very
short period of time after join() returns where the dying thread is
still listed in /proc/self/task/. This meant that we would sometimes
fail the assertion that checks that the number of returned stacks
matches the number of running threads (because the thread finished
exiting just before we called SnapshotAllStacks). In other cases we
would still see that exiting thread, but it would exit before responding
to our stack trace signal, so we'd see a "timed out" response.

The fix is a little hacky - we calculate the expected number of running
threads and ASSERT_EVENTUALLY to wait until reality matches
expectations.

Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
---
M src/kudu/util/debug-util-test.cc
1 file changed, 30 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/9864/3
-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................


Patch Set 3: Verified+1 Code-Review+2


-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 3
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:07:37 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................

KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

This test was flaky with two different failure modes which I believe
have the same root cause: when the test starts, it's possible that there
is still some thread from a prior test case in the process of shutting
down. Even though we join() on our threads religiously, there is a very
short period of time after join() returns where the dying thread is
still listed in /proc/self/task/. This meant that we would sometimes
fail the assertion that checks that the number of returned stacks
matches the number of running threads (because the thread finished
exiting just before we called SnapshotAllStacks). In other cases we
would still see that exiting thread, but it would exit before responding
to our stack trace signal, so we'd see a "timed out" response.

The fix is a little hacky - we calculate the expected number of running
threads and ASSERT_EVENTUALLY to wait until reality matches
expectations.

Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Reviewed-on: http://gerrit.cloudera.org:8080/9864
Reviewed-by: Todd Lipcon <to...@apache.org>
Tested-by: Todd Lipcon <to...@apache.org>
---
M src/kudu/util/debug-util-test.cc
1 file changed, 30 insertions(+), 6 deletions(-)

Approvals:
  Todd Lipcon: Looks good to me, approved; Verified

-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 4
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Hello Kudu Jenkins, 

I'd like you to reexamine a change. Please visit

    http://gerrit.cloudera.org:8080/9864

to look at the new patch set (#2).

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................

KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

This test was flaky with two different failure modes which I believe
have the same root cause: when the test starts, it's possible that there
is still some thread from a prior test case in the process of shutting
down. Even though we join() on our threads religiously, there is a very
short period of time after join() returns where the dying thread is
still listed in /proc/self/task/. This meant that we would sometimes
fail the assertion that checks that the number of returned stacks
matches the number of running threads (because the thread finished
exitinng just before we called SnapshotAllStacks). In other cases we
would still see that exiting thread, but it would exit before responding
to our stack trace signal, so we'd see a "timed out" response.

The fix is a little hacky - we calculate the expected number of running
threads and ASSERT_EVENTUALLY to wait until reality matches
expectations.

Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
---
M src/kudu/util/debug-util-test.cc
1 file changed, 30 insertions(+), 6 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/kudu refs/changes/64/9864/2
-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................


Patch Set 1:

I looped this 100 times prior to the fix, with --stress-cpu-threads=4, and it failed 4%. I then looped with the fix 400 times and it failed 0.


-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 02:59:27 +0000
Gerrit-HasComments: No

[kudu-CR] KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot

Posted by "Todd Lipcon (Code Review)" <ge...@cloudera.org>.
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/9864 )

Change subject: KUDU-2393. Fix flaky DebugUtilTest.TestSnapshot
......................................................................


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/9864/2//COMMIT_MSG@17
PS2, Line 17: exitinng
> exiting
I bet you can't wait for my new laptop to arrive



-- 
To view, visit http://gerrit.cloudera.org:8080/9864
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I2f0fc29877b4d67a3720e47f4183684891f09bc0
Gerrit-Change-Number: 9864
Gerrit-PatchSet: 2
Gerrit-Owner: Todd Lipcon <to...@apache.org>
Gerrit-Reviewer: Adar Dembo <ad...@cloudera.com>
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Todd Lipcon <to...@apache.org>
Gerrit-Comment-Date: Fri, 30 Mar 2018 16:06:46 +0000
Gerrit-HasComments: Yes