You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@mynewt.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2017/05/13 01:41:04 UTC

[jira] [Commented] (MYNEWT-745) Sim - deadlock involving system calls

    [ https://issues.apache.org/jira/browse/MYNEWT-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16009030#comment-16009030 ] 

ASF subversion and git services commented on MYNEWT-745:
--------------------------------------------------------

Commit cc1acfe8ddc8d88c1637b7bd7374c35fc0ace90f in incubator-mynewt-core's branch refs/heads/master from [~ccollins476]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-mynewt-core.git;h=cc1acfe ]

MYNEWT-745 Sim - deadlock involving system calls

This commit splits sim into two separate implementations:
    * "signals"
    * "no-signals"

The user chooses which implementation to use via the
MCU_NATIVE_USE_SIGNALS syscfg setting (defined in hw/mcu/native).  The
two implementations are described below:

signals:
    More correctness; less stability.  The OS tick timer will
    cause a high-priority task to preempt a low-priority task.
    This causes stability issues because a task can be preempted
    while it is in the middle of a system call, potentially
    causing deadlock or memory corruption.

no-signals:
    Less correctness; more stability.  The OS tick timer only
    runs while the idle task is active.  Therefore, a sleeping
    high-priority task will not preempt a low-priority task due
    to a timing event (e.g., delay or callout expired).
    However, this version of sim does not suffer from the
    stability issues that affect the "signals" implementation.


> Sim - deadlock involving system calls
> -------------------------------------
>
>                 Key: MYNEWT-745
>                 URL: https://issues.apache.org/jira/browse/MYNEWT-745
>             Project: Mynewt
>          Issue Type: Bug
>            Reporter: Christopher Collins
>            Assignee: Christopher Collins
>             Fix For: v1_1_0_rel
>
>         Attachments: main.c
>
>
> The problem appears to occur when a system call is interrupted by a sim context switch.  Because a sim context switch is implemented as a signal handler that never returns (it calls longjmp()), the system call is left unfinished.  In some cases, it seems the system call acquired some resources that it never got a chance to release, leading to deadlock on a subsequent system call. For whatever reason, when the original system call is resumed (i.e., when Mynewt switch back to the original task), the syscall is unable to recover.
> In sim, a context switch is triggered by delivery of a SIGURG signal. A few events generate this signal:
> # A task calls an OS function with the potential to switch tasks (e.g., os_eventq_get(), os_mutex_release(), etc.).
> # An OS tick occurs.
> The problem appears to occur when an OS tick generates the SIGURG signal.  The OS ticker is implemented via an itimer, which generates the SIG_ALRM signal on each tick.  The SIG_ALRM handler advances the OS time, and then calls os_sched(), potentially generating a SIGURG signal.  If the current task happened to be in the middle of a syscall when the tick timer expired, the SIGURG signal gets handled before the syscall returns.
> Here is a stack trace showing a context switch in the middle of a system call:
> {noformat}
> (gdb) whe
> #0  0x0804a3bd in ctxsw_handler (sig=23)
>     at kernel/os/src/arch/sim/os_arch_sim.c:150
> #1  <signal handler called>
> #2  0xf7ffdbe7 in __kernel_vsyscall ()
> #3  0x08097630 in __lll_lock_wait_private ()
> #4  0x080923b0 in __tz_convert ()
> #5  0x08091673 in localtime ()
> #6  0x0809162c in ctime ()
> #7  0x08048a5a in task1_handler (arg=0x0) at apps/slinky/src/main.c:162
> #8  0x0804a2c8 in os_arch_task_start (sf=0x8160314, rc=1)
>     at kernel/os/src/arch/sim/os_arch_sim.c:88
> #9  0x0804ad90 in os_arch_frame_init ()
>     at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
> #10 0x0804ad90 in os_arch_frame_init ()
>     at kernel/os/src/arch/sim/os_arch_stack_frame.s:98
> {noformat}
> Attached is a simple Mynewt app that can be used to replicate this issue (main.c).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)