You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by ma...@apache.org on 2020/11/24 21:59:01 UTC
[incubator-nuttx] 01/06: Revert "Update TODO regarding SMP"
This is an automated email from the ASF dual-hosted git repository.
masayuki pushed a commit to branch revert-2348-fix_pause_handler_for_smp
in repository https://gitbox.apache.org/repos/asf/incubator-nuttx.git
commit f2347a708b68d6750850a9370deb856b0248948a
Author: Masayuki Ishikawa <ma...@gmail.com>
AuthorDate: Wed Nov 25 06:58:49 2020 +0900
Revert "Update TODO regarding SMP"
This reverts commit 96c29e75b7edf076e7ae3b935918b0366fce0287.
---
TODO | 66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 65 insertions(+), 1 deletion(-)
diff --git a/TODO b/TODO
index c564f75..acc6e5b 100644
--- a/TODO
+++ b/TODO
@@ -10,7 +10,7 @@ issues related to each board port.
nuttx/:
(16) Task/Scheduler (sched/)
- (2) SMP
+ (3) SMP
(1) Memory Management (mm/)
(0) Power Management (drivers/pm)
(5) Signals (sched/signal, arch/)
@@ -485,6 +485,70 @@ o SMP
an bugs caused by this. But I believe that failures are
possible.
+ Title: POSSIBLE FOR TWO CPUs TO HOLD A CRITICAL SECTION?
+ Description: The SMP design includes logic that will support multiple
+ CPUs holding a critical section. Is this necessary? How
+ can that occur? I think it can occur in the following
+ situation:
+
+ The log below was reported is NuttX running on two cores
+ Cortex-A7 architecture in SMP mode. You can notice see that
+ when nxsched_add_readytorun() was called, the g_cpu_irqset is 3.
+
+ nxsched_add_readytorun: irqset cpu 1, me 0 btcbname init, irqset 1 irqcount 2.
+ nxsched_add_readytorun: nxsched_add_readytorun line 338 g_cpu_irqset = 3.
+
+ This can happen, but only under a very certain condition.
+ g_cpu_irqset only exists to support this certain condition:
+
+ a. A task running on CPU 0 takes the critical section. So
+ g_cpu_irqset == 0x1.
+
+ b. A task exits on CPU 1 and a waiting, ready-to-run task
+ is re-started on CPU 1. This new task also holds the
+ critical section. So when the task is re-restarted on
+ CPU 1, we than have g_cpu_irqset == 0x3
+
+ So we are in a very perverse state! There are two tasks
+ running on two different CPUs and both hold the critical
+ section. I believe that is a dangerous situation and there
+ could be undiscovered bugs that could happen in that case.
+ However, as of this moment, I have not heard of any specific
+ problems caused by this weird behavior.
+
+ A possible solution would be to add a new task state that
+ would exist only for SMP.
+
+ - Add a new SMP-only task list and state. Say,
+ g_csection_wait[]. It should be prioritized.
+ - When a task acquires the critical section, all tasks in
+ g_readytorun[] that need the critical section would be
+ moved to g_csection_wait[].
+ - When any task is unblocked for any reason and moved to the
+ g_readytorun[] list, if that unblocked task needs the
+ critical section, it would also be moved to the
+ g_csection_wait[] list. No task that needs the critical
+ section can be in the ready-to-run list if the critical
+ section is not available.
+ - When the task releases the critical section, all tasks in
+ the g_csection_wait[] needs to be moved back to
+ g_readytorun[].
+ - This may result in a context switch. The tasks should be
+ moved back to g_readytorun[] highest priority first. If a
+ context switch occurs and the critical section to re-taken
+ by the re-started task, the lower priority tasks in
+ g_csection_wait[] must stay in that list.
+
+ That is really not as much work as it sounds. It is
+ something that could be done in 2-3 days of work if you know
+ what you are doing. Getting the proper test setup and
+ verifying the change would be the more difficult task.
+
+Status: Open
+Priority: Unknown. Might be high, but first we would need to confirm
+ that this situation can occur and that is actually causes
+ a failure.
+
o Memory Management (mm/)
^^^^^^^^^^^^^^^^^^^^^^^