You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2021/12/09 04:37:54 UTC

[GitHub] [incubator-nuttx] anchao opened a new pull request #4953: sched/semaphore: fix race condition for timed wait

anchao opened a new pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953


   ## Summary
   
   sched/semaphore: fix race condition for timed wait
   
   There is a potential problem that can lead to deadlock at
   condition wait, if the timeout of watchdog is a very small value
   (less than 1ms ?), the timer interrupt will come before the nxsem_wait()
   
   Regression by:
   sched: semaphore: Remove a redundant critical section in nxsem_tickwait()
   sched: semaphore: Remove a redundant critical section in nxsem_clockwait()
   sched: pthread: Remove a redundant critical section in pthread_condclockwsait.c
   
   Signed-off-by: chao.an <an...@xiaomi.com>
   
   ## Impact
   
   System stability
   
   ## Testing
   
   **_timedwait() with small timeout


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992218024


   @anchao 
   
   >I reproduced this issue with new ostest update:
   >anchao/incubator-nuttx-apps@4659331
   >during the ostest processing, we need uart interrupts to help us increase the system loading:
   
   I tried the above commit and procedures with sabre-6quad:nsh and spresense:wifi but I can not reproduce a deadlock.
   I think you are using CONFIG_SCHED_RR_INTERVAL=0 with your defconfig, so I also tried the setting but I still can not reproduce a deadlock.
   
   I think it depends on configurations, so please provide the defconfig.
   Thanks,
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992036385


   >There is a potential problem that can lead to deadlock at
   >condition wait, if the timeout of watchdog is a very small value
   >(less than 1ms ?), the timer interrupt will come before the nxsem_wait()
   
   @anchao 
   Let me confirm the symptom.
   If the timer interrupt happens before calling nxsem_wait(), pthread_cond_timedwait() never returns (i.e. deadlock).
   Is my understanding correct?  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 merged pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 merged pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-989536355


   >Regression by:
   >sched: semaphore: Remove a redundant critical section in nxsem_tickwait()
   >sched: semaphore: Remove a redundant critical section in nxsem_clockwait()
   >sched: pthread: Remove a redundant critical section in pthread_condclockwsait.c
   
   @anchao 
   Can you create revert commits to trace the changes?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on a change in pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on a change in pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#discussion_r767387490



##########
File path: sched/semaphore/sem_tickwait.c
##########
@@ -125,8 +131,21 @@ int nxsem_tickwait(FAR sem_t *sem, clock_t start, uint32_t delay)
 
   wd_cancel(&rtcb->waitdog);
 
-out:
+  if (ret < 0)
+    {
+      goto errout_with_irqdisabled;
+    }
+
+  /* We can now restore interrupts */
+
+  /* Success exits */
+
+success_with_irqdisabled:
+
+  /* Error exits */
 
+errout_with_irqdisabled:

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on a change in pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on a change in pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#discussion_r767387454



##########
File path: sched/semaphore/sem_clockwait.c
##########
@@ -173,7 +179,11 @@ int nxsem_clockwait(FAR sem_t *sem, clockid_t clockid,
 
   wd_cancel(&rtcb->waitdog);
 
-out:
+  /* We can now restore interrupts and delete the watchdog */
+
+success_with_irqdisabled:
+errout_with_irqdisabled:

Review comment:
       here is redundant, i will fix this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-989505714


   cc @masayuki2009 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-989584770


   >@masayuki2009 san, please review again
   
   I'll check the commits later.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-990765026


   @anchao 
   
   >I prepared a demo, which still difficult to reproduce this issue on test environment....
   
   OK, let me try the demo.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992068222


   > @anchao Let me confirm the symptom. If the timer interrupt happens before calling nxsem_wait(), pthread_cond_timedwait() never returns (i.e. deadlock). Is my understanding correct?
   
   @masayuki2009 yes, the deadlock is triggered when the tick period is very short


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] pkarashchenko commented on a change in pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
pkarashchenko commented on a change in pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#discussion_r767184483



##########
File path: sched/semaphore/sem_tickwait.c
##########
@@ -125,8 +131,21 @@ int nxsem_tickwait(FAR sem_t *sem, clock_t start, uint32_t delay)
 
   wd_cancel(&rtcb->waitdog);
 
-out:
+  if (ret < 0)
+    {
+      goto errout_with_irqdisabled;
+    }
+
+  /* We can now restore interrupts */
+
+  /* Success exits */
+
+success_with_irqdisabled:
+
+  /* Error exits */
 
+errout_with_irqdisabled:

Review comment:
       What is the benefit of this vs single `out:`?

##########
File path: sched/semaphore/sem_clockwait.c
##########
@@ -173,7 +179,11 @@ int nxsem_clockwait(FAR sem_t *sem, clockid_t clockid,
 
   wd_cancel(&rtcb->waitdog);
 
-out:
+  /* We can now restore interrupts and delete the watchdog */
+
+success_with_irqdisabled:
+errout_with_irqdisabled:

Review comment:
       What is the benefit of this vs single `out:`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-989580180


   > @anchao Can you create revert commits to trace the changes?
   
   @masayuki2009 san, please review again


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992127232


   Though I think we should create a test case to reproduce the symptom with hopefully QEMU.
   let me merge this PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992104459


   >I prepared a demo, which still difficult to reproduce this issue on test environment....
   
   @anchao 
   I tried the demo both with and without this PR, but I could not reproduce the symptom.
   Actually, a deadlock happened only with sim (both non-SMP and SMP) but it still happened without this PR.
   So, this might be another issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992239523


   @masayuki2009 san,
   
   Please check the attachment, 
   [defconfig.txt](https://github.com/apache/incubator-nuttx/files/7702313/defconfig.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] pkarashchenko commented on a change in pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
pkarashchenko commented on a change in pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#discussion_r767184483



##########
File path: sched/semaphore/sem_tickwait.c
##########
@@ -125,8 +131,21 @@ int nxsem_tickwait(FAR sem_t *sem, clock_t start, uint32_t delay)
 
   wd_cancel(&rtcb->waitdog);
 
-out:
+  if (ret < 0)
+    {
+      goto errout_with_irqdisabled;
+    }
+
+  /* We can now restore interrupts */
+
+  /* Success exits */
+
+success_with_irqdisabled:
+
+  /* Error exits */
 
+errout_with_irqdisabled:

Review comment:
       What is the benefit of this vs single `out:`?

##########
File path: sched/semaphore/sem_clockwait.c
##########
@@ -173,7 +179,11 @@ int nxsem_clockwait(FAR sem_t *sem, clockid_t clockid,
 
   wd_cancel(&rtcb->waitdog);
 
-out:
+  /* We can now restore interrupts and delete the watchdog */
+
+success_with_irqdisabled:
+errout_with_irqdisabled:

Review comment:
       What is the benefit of this vs single `out:`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992128570


   
   > @anchao I tried the demo both with and without this PR, but I could not reproduce the symptom. Actually, a deadlock happened only with sim (both non-SMP and SMP) but it still happened without this PR. So, this might be another issue.
   
   @masayuki2009 we can not use this demo on sim platform(have while (1) ), SIM is a coroutine model(use setjmp/longjmp to implement context switch), the program can not yield the time slice if busy loop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] masayuki2009 commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
masayuki2009 commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992156759


   @anchao 
   Could you provide the defconfig with which you were able to reproduce the symptom?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #4953: sched/semaphore: fix race condition for timed wait

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #4953:
URL: https://github.com/apache/incubator-nuttx/pull/4953#issuecomment-992188093


   @masayuki2009 san,
   
   I reproduced this issue with new ostest update:
   
   https://github.com/anchao/incubator-nuttx-apps/commit/46593311374518aac737f36d8d799d9ad09c4d4f
   
   during the ostest processing, we need uart interrupts to help us increase the system loading:
   ```
   
   ap> ostest &
   ...
   thread_waiter: Taking mutex
   thread_waiter: Starting 5 second wait for condition
   ap> ps
     PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED    CPU COMMAND
   ...
     113   113 100 FIFO     Task    --- Waiting  Signal    00000000 002032 000752  37.0%    0.0% ostest
     114   114 100 FIFO     Task    --- Waiting  Semaphore 00000000 008136 000760   9.3%    0.0% ostest Arg1 Arg2Arg3 Arg4
   ...
     119   114 100 FIFO     pthread --- Ready              00000000 002048 000468  22.8%    0.0% pt-0x2c4154fd 0
     120   114  99 FIFO     pthread --- Ready              00000000 002048 000760  37.1%    1.2% pt-0x2c41540d 0
   ```
   
   
   ------------      ..... keep typing ps .... ----------------------------------
   
   
   ```
   ap> ps
   ...
   ap> ps
   ...
   ap> ps
   ...
   ap> ps
     PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED    CPU COMMAND
   ...
     113   113 100 FIFO     Task    --- Waiting  Signal    00000000 002032 000752  37.0%    0.0% ostest
     114   114 100 FIFO     Task    --- Waiting  Semaphore 00000000 008136 000760   9.3%    0.0% ostest Arg1 Arg2 Arg3 Arg4
   ...
     119   114 100 FIFO     pthread --- Ready              00000000 002048 000468  22.8%    0.0% pt-0x2c4154fd 0
     120   114  99 FIFO     pthread --- Waiting  Semaphore 00000000 002048 000760  37.1%    1.4% pt-0x2c41540d 0
   
   ap> ps
     PID GROUP PRI POLICY   TYPE    NPX STATE    EVENT     SIGMASK   STACK   USED  FILLED    CPU COMMAND
   ...
     113   113 100 FIFO     Task    --- Waiting  Signal    00000000 002032 000752  37.0%    0.0% ostest
     114   114 100 FIFO     Task    --- Waiting  Semaphore 00000000 008136 000760   9.3%    0.0% ostest Arg1 Arg2 Arg3 Arg4
   ...
     120   114  99 FIFO     pthread --- Waiting  Semaphore 00000000 002048 000760  37.1%    0.7% pt-0x2c41540d 0
   ```
   the waiting time has already exceeded 5s, check the callstack:
   thread 119 is thread_busy, timeout and exit,
   thread 120 is thread_waiter, deadlock:
   
   ```
   ap> dumpstack 113
   [  128.702121] [121] [ EMERG] [ap] backtrace:
   [  128.702566] [121] [ EMERG] [ap] [113] [<0x2c3d74d2>] arm_switchcontext+0xe/0x18
   [  128.703782] [121] [ EMERG] [ap] [113] [<0x2c3b5ddf>] nxsig_timedwait+0xdf/0x18c
   [  128.704982] [121] [ EMERG] [ap] [113] [<0x2c3b4878>] nx_waitpid+0x158/0x238
   [  128.706282] [121] [ EMERG] [ap] [113] [<0x2c3b495c>] waitpid+0x4/0x20
   [  128.707378] [121] [ EMERG] [ap] [113] [<0x2c4157c8>] ostest_main+0x134/0x258
   [  128.708588] [121] [ EMERG] [ap] [113] [<0x2c3c9b80>] nxtask_startup+0x40/0x68
   [  128.709788] [121] [ EMERG] [ap] [113] [<0x2c3b6ed6>] nxtask_start+0x46/0x60
   ap> dumpstack 114
   [  130.567113] [122] [ EMERG] [ap] backtrace:
   [  130.567528] [122] [ EMERG] [ap] [114] [<0x2c3d74d2>] arm_switchcontext+0xe/0x18
   [  130.568786] [122] [ EMERG] [ap] [114] [<0x2c3b5169>] nxsem_wait_uninterruptible+0x69/0xac
   [  130.570114] [122] [ EMERG] [ap] [114] [<0x2c3b3c2e>] pthread_sem_take+0x1a/0x2c
   [  130.571317] [122] [ EMERG] [ap] [114] [<0x2c3b363a>] pthread_join+0x9e/0xdc
   [  130.572479] [122] [ EMERG] [ap] [114] [<0x2c4155da>] timedwait_test+0x9a/0x148
   [  130.573713] [122] [ EMERG] [ap] [114] [<0x2c41568c>] user_main+0x4/0xc
   [  130.574855] [122] [ EMERG] [ap] [114] [<0x2c3c9b80>] nxtask_startup+0x40/0x68
   [  130.576044] [122] [ EMERG] [ap] [114] [<0x2c3b6ed6>] nxtask_start+0x46/0x60
   ap> dumpstack 119
   ap> dumpstack 120
   [  135.550877] [124] [ EMERG] [ap] backtrace:
   [  135.551292] [124] [ EMERG] [ap] [120] [<0x2c3d74d2>] arm_switchcontext+0xe/0x18
   [  135.552538] [124] [ EMERG] [ap] [120] [<0x2c3b50cf>] nxsem_wait+0x57/0x88
   [  135.553681] [124] [ EMERG] [ap] [120] [<0x2c3b3ae6>] pthread_cond_clockwait+0xca/0x110
   [  135.554969] [124] [ EMERG] [ap] [120] [<0x2c415462>] thread_waiter+0x56/0xf0
   [  135.556177] [124] [ EMERG] [ap] [120] [<0x2c3c984a>] pthread_startup+0xa/0x1c
   [  135.557427] [124] [ EMERG] [ap] [120] [<0x2c3b3262>] pthread_start+0x56/0x90
   ap> 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org