You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@mynewt.apache.org by "Christopher Collins (JIRA)" <ji...@apache.org> on 2017/05/10 00:42:04 UTC

[jira] [Assigned] (MYNEWT-745) Sim - deadlock involving system calls

     [ https://issues.apache.org/jira/browse/MYNEWT-745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Christopher Collins reassigned MYNEWT-745:
------------------------------------------

    Assignee: Christopher Collins

> Sim - deadlock involving system calls
> -------------------------------------
>
>                 Key: MYNEWT-745
>                 URL: https://issues.apache.org/jira/browse/MYNEWT-745
>             Project: Mynewt
>          Issue Type: Bug
>            Reporter: Christopher Collins
>            Assignee: Christopher Collins
>             Fix For: v1_1_0_rel
>
>         Attachments: main.c
>
>
> The problem appears to occur when a system call is interrupted by a sim context switch.  Because a sim context switch is implemented as a signal handler that never returns (it calls longjmp()), the system call is left unfinished.  In some cases, it seems the system call acquired some resources that it never got a chance to release, leading to deadlock on a subsequent system call. For whatever reason, when the original system call is resumed (i.e., when Mynewt switch back to the original task), the syscall is unable to recover.
> In sim, a context switch is triggered by delivery of a SIGURG signal. A few events generate this signal:
> # A task calls an OS function with the potential to switch tasks (e.g., os_eventq_get(), os_mutex_release(), etc.).
> # An OS tick occurs.
> The problem appears to occur when an OS tick generates the SIGURG signal.  The OS ticker is implemented via an itimer, which generates the SIG_ALRM signal on each tick.  The SIG_ALRM handler advances the OS time, and then calls os_sched(), potentially generating a SIGURG signal.  If the current task happened to be in the middle of a syscall when the tick timer expired, the SIGURG signal gets handled before the syscall returns.
> Here is a stack trace showing a context switch in the middle of a system call:
> {noformat}
> (gdb) whe
> #0  0x0804a3bd in ctxsw_handler (sig=23)
>     at kernel/os/src/arch/sim/os_arch_sim.c:150
> #1  <signal handler called>
> #2  0xf7ffdbe7 in __kernel_vsyscall ()
> #3  0x08097630 in __lll_lock_wait_private ()
> #4  0x080923b0 in __tz_convert ()
> #5  0x08091673 in localtime ()
> #6  0x0809162c in ctime ()
> #7  0x08048a5a in task1_handler (arg=0x0) at apps/slinky/src/main.c:162
> #8  0x0804a2c8 in os_arch_task_start (sf=0x8160314, rc=1)
>     at kernel/os/src/arch/sim/os_arch_sim.c:88
> #9  0x0804ad90 in os_arch_frame_init ()
>     at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
> #10 0x0804ad90 in os_arch_frame_init ()
>     at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
> {noformat}
> Attached is a simple Mynewt app that can be used to replicate this issue (main.c).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)