You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by je...@apache.org on 2020/11/03 07:19:49 UTC

[incubator-nuttx] branch master updated: sched: irq: Fix enter_critical_section() in an irq handler for SMP

This is an automated email from the ASF dual-hosted git repository.

jerpelea pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-nuttx.git


The following commit(s) were added to refs/heads/master by this push:
     new 6ec9408  sched: irq: Fix enter_critical_section() in an irq handler for SMP
6ec9408 is described below

commit 6ec94082a10a914acc50330ab40f63a425fa67d0
Author: Masayuki Ishikawa <ma...@gmail.com>
AuthorDate: Mon Nov 2 17:06:21 2020 +0900

    sched: irq: Fix enter_critical_section() in an irq handler for SMP
    
    Summary:
    - I found a deadlock during Wi-Fi audio streaming test plus stress test
    - The testing environment was spresense:wifi_smp (NCPUS=4)
    - The deadlock happened because two CPUs called up_cpu_pause() almost simultaneously
    - This situation should not happen, because up_cpu_pause() is called in a critical section
    - Actually, the latter call was from nxsem_post() in an IRQ handler
    - And when enter_critical_section() was called, irq_waitlock() detected a deadlock
    - Then it called up_cpu_paused() to break the deadlock
    - However, this resulted in setting g_cpu_irqset on the CPU
    - Even though another CPU had held a g_cpu_irqlock
    - This situation violates the critical section and should be avoided
    - To avoid the situation, if a CPU sets g_cpu_irqset after calling up_cpu_paused()
    - The CPU must release g_cpu_irqlock first
    - Then retry irq_waitlock() to acquire g_cpu_irqlock
    
    Impact:
    - Affect SMP
    
    Testing:
    - Tested with spresense:wifi_smp (NCPUS=2 and 4)
    - Tested with spresense:smp
    - Tested with sim:smp
    - Tested with sabre-6quad:smp (QEMU)
    - Tested with maix-bit:smp (QEMU)
    - Tested with esp32-core:smp (QEMU)
    - Tested with lc823450-xgevk:rndis
    
    Signed-off-by: Masayuki Ishikawa <Ma...@jp.sony.com>
---
 sched/irq/irq_csection.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/sched/irq/irq_csection.c b/sched/irq/irq_csection.c
index 6993641..9eeb4b4 100644
--- a/sched/irq/irq_csection.c
+++ b/sched/irq/irq_csection.c
@@ -265,6 +265,7 @@ try_again:
                    * no longer blocked by the critical section).
                    */
 
+try_again_in_irq:
                   if (!irq_waitlock(cpu))
                     {
                       /* We are in a deadlock condition due to a pending
@@ -273,6 +274,24 @@ try_again:
                        */
 
                       DEBUGVERIFY(up_cpu_paused(cpu));
+
+                      /* NOTE: As the result of up_cpu_paused(cpu), this CPU
+                       * might set g_cpu_irqset in nxsched_resume_scheduler()
+                       * However, another CPU might hold g_cpu_irqlock.
+                       * To avoid this situation, releae g_cpu_irqlock first.
+                       */
+
+                      if ((g_cpu_irqset & (1 << cpu)) != 0)
+                        {
+                          spin_clrbit(&g_cpu_irqset, cpu, &g_cpu_irqsetlock,
+                                      &g_cpu_irqlock);
+                        }
+
+                      /* NOTE: Here, this CPU does not hold g_cpu_irqlock,
+                       * so call irq_waitlock(cpu) to acquire g_cpu_irqlock.
+                       */
+
+                      goto try_again_in_irq;
                     }
                 }