You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@qpid.apache.org by "Ken Giusti (Jira)" <ji...@apache.org> on 2021/05/10 19:05:00 UTC

[jira] [Updated] (DISPATCH-2059) Support running router under rr during test execution

     [ https://issues.apache.org/jira/browse/DISPATCH-2059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ken Giusti updated DISPATCH-2059:
---------------------------------
    Fix Version/s: Backlog

> Support running router under rr during test execution
> -----------------------------------------------------
>
>                 Key: DISPATCH-2059
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-2059
>             Project: Qpid Dispatch
>          Issue Type: Wish
>          Components: Tests
>    Affects Versions: 1.15.0
>            Reporter: Jiri Daněk
>            Assignee: Jiri Daněk
>            Priority: Major
>             Fix For: Backlog
>
>
> Dispatch has env variable {{QPID_DISPATCH_RUNNER}} which is (according to comment) intended to be used for running tests under valgrind. That is outdated comment, because the memory checking is currently solved in a different way, in {{RuntimeChecks.cmake}}. One tool that would make sense to use to wrap dispatch is rr, the record-replay debugger from Mozilla (https://rr-project.org/).
> I've previously tried rr with (very) limited success in DISPATCH-782.
> [~aconway] considered it while working on DISPATCH-902 and used it on other issues.
> There has been an attempt https://issues.apache.org/jira/browse/DISPATCH-739?focusedCommentId=15983719&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-15983719 to use rr which however did not survive in the mainline to the present day.
> I have two problems with rr:
> # Dispatch system-tests send SIGTERM to the subprocess itself, which is rr. What is necessary is to kill its children instead. Killing rr causes abrupt termination of the recording. When I issue ^C to a {{rr record qdrouterd -c ...}} in the terminal, that signal goes correctly to the child. I am not sure what's happening there in the test, where the difference comes from. Explicitly killing only children in the system test does the right thing. Sadly doing that requires hacks, python's subprocess does not allow to query children easily. The os module has some ways; psutil is the easiest, but thats a 3rd party dependency.
> # CLion debugger disconnects during replay when qdrouterd gets SIGTERM, but the router handles that signal and continues running (cleanup)
> One awesome feature of rr is that the recording can be replayed many times, backwards and forwards, and all memory addresses stay the same in the recording, on every replay. Meaning that one can use {{watch -l *0x0000000}} breakpoints to watch specific places of memory, and use {{reverse-cont}} gdb command. (rr emulates the gdb UI, it's a wrapper over gdb, actually, if I understand correctly.)
> h3. Chaos mode
> rr has a {{--chaos}} switch which tries to explore thread schedules as to reveal more crashes; that could be useful



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@qpid.apache.org
For additional commands, e-mail: dev-help@qpid.apache.org