You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@mynewt.apache.org by "Christopher Collins (JIRA)" <ji...@apache.org> on 2017/05/08 18:23:04 UTC

[jira] [Created] (MYNEWT-745) Sim - deadlock involving system calls

Christopher Collins created MYNEWT-745:
------------------------------------------

             Summary: Sim - deadlock involving system calls
                 Key: MYNEWT-745
                 URL: https://issues.apache.org/jira/browse/MYNEWT-745
             Project: Mynewt
          Issue Type: Bug
            Reporter: Christopher Collins
             Fix For: v1_1_0_rel


The problem appears to occur when a system call is interrupted by a sim context switch.  Because a sim context switch is implemented as a signal handler that never returns (it calls longjmp()), the system call is left unfinished.  In some cases, it seems the system call acquired some resources that it never got a chance to release, leading to deadlock on a subsequent system call.

Sim has protections in place to prevent this problem from happening.  Specifically, a context switch is triggered by delivery of a SIGURG signal, and SIGURG is only sent from within the SIGALARM signal handler.  These handlers are configured such that all signals are blocked until the handlers complete (I am not sure how this works for the SIGURG handler, considering it never returns).

My initial guess was that a pending SIGURG signal does not get delivered as soon as it is unblocked at the end of the SIGALARM handler.  However, a simple test using sigpending() and sleep prove that this is not the case.

Here is a stack trace showing a context switch in the middle of a system call:

```
(gdb) whe
#0  0x0804a3bd in ctxsw_handler (sig=23)
    at kernel/os/src/arch/sim/os_arch_sim.c:150
#1  <signal handler called>
#2  0xf7ffdbe7 in __kernel_vsyscall ()
#3  0x08097630 in __lll_lock_wait_private ()
#4  0x080923b0 in __tz_convert ()
#5  0x08091673 in localtime ()
#6  0x0809162c in ctime ()
#7  0x08048a5a in task1_handler (arg=0x0) at apps/slinky/src/main.c:162
#8  0x0804a2c8 in os_arch_task_start (sf=0x8160314, rc=1)
    at kernel/os/src/arch/sim/os_arch_sim.c:88
#9  0x0804ad90 in os_arch_frame_init ()
    at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
#10 0x0804ad90 in os_arch_frame_init ()
    at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
```

Attached is a simple Mynewt app that can be used to replicate this issue (main.c).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)