You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2020/08/14 05:27:32 UTC

[GitHub] [incubator-nuttx] anchao opened a new pull request #1585: sched/task: do not migrate the task state to INVALID

anchao opened a new pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585


   
   ## Summary
   
   sched/task: do not migrate the task state to INVALID 
   
   which still on used in task/nxmq_recover()
   
   Change-Id: I31273aadd9e09c283cc3d0420dfc854ca8ae1899
   Signed-off-by: chao.an <an...@xiaomi.com>
   
   
   assertion:
   up_assert: Assertion failed at file:mqueue/mq_sndinternal.c line: 420 task: lpwork
   up_registerdump: R0: 00000001 10008640 100126df 1000f524 00000000 9b8123d4 10012600 1000f81c
   up_registerdump: R8: 10013358 0000002c 00000000 10013420 00004000 10013358 9b0070c7 9b007716
   up_registerdump: xPSR: 61000000 BASEPRI: 000000a0 CONTROL: 00000000
   up_registerdump: EXC_RETURN: fffffff9
   up_dumpstate: sp:         10013358
   up_dumpstate: stack base: 10013500
   
   
   ## Impact
   
   ## Testing
   
   kill a thread which blocked in mq_receive(2) and restart the thread again.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-719143911


   @btashton pleaase help merge this to 10.0.0 to avoid the regression, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-674126688


   This issue is caused by mqueue_inode_s:nwaitnotempty flag mismatch,
   From the code view, if we Kill a thread that is pending for mq_receive(2):
   
   ```
   nxtask_terminate
   |
    ->dtcb->task_state = TSTATE_TASK_INVALID;  <--- task state has been switched to invalid 
    ->nxtask_exithook
      |
       ->nxtask_recover
         |
          ->nxmq_recover
   ```
   
   ```
   void nxmq_recover(FAR struct tcb_s *tcb)
   { 
   ...
     if (tcb->task_state == TSTATE_WAIT_MQNOTEMPTY)    <--- state already change to the invalid
       { 
   ...
         tcb->msgwaitq->nwaitnotempty--;               <---  nwaitnotempty remain 1
       }
     else if (tcb->task_state == TSTATE_WAIT_MQNOTFULL)
       { 
         DEBUGASSERT(tcb->msgwaitq && tcb->msgwaitq->nwaitnotfull > 0);
         tcb->msgwaitq->nwaitnotfull--;
       }
   }
   ```
   
   Since the nwaitnotempty is still 1, if the application triggers mq_send here, the system will trigger assert 
   
   ```
   int nxmq_do_send(mqd_t mqdes, FAR struct mqueue_msg_s *mqmsg,
                    FAR const char *msg, size_t msglen, unsigned int prio)
   {
    ...
     if (msgq->nwaitnotempty > 0)
       {
   ...
         DEBUGASSERT(btcb);
   ...
   }
   ```
   
   To fix this issue, we must delay the task state switch after recover() call, but if the state switch just after recover(), there is no reason to change the state, cecause TCB will be destroyed soon, so it is better to remove the state invalid switch.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-693508669


   Hi @patacongo ,
   
   I raised an issue on https://github.com/apache/incubator-nuttx/issues/1804 , could you please have a look?
   I will close this PR if the issue can be resolved completely.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-691505046


   @patacongo could you take a look? It is a panic issue, we should fix as soon as possible.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-691505046






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-691505046






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-684053229


   > 
   > 
   > @patacongo do you want:
   > 1.Move line 158-161 after line 188
   > 2.Or move line 177-188 before line 158
   > But both modification will introduce the huge change of termination sequence. Since tcb is freed immediately at line 199, the change made by @anchao is reasonable.
   
   I actually do not have any strong opinion.  I do not fully understand either the problem or the solution.  My only point is that disassociating  task_state from the list that the TCB resides in is very dangerous.  At least this danger should be documenting fully and clearly in comments so that no one seriously breaks things in the future.
   
   The meaning of task_state is the list that the TCB resides in.  The PR breaks that meaning.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 merged pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 merged pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] anchao commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
anchao commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-674153735


   In fact, at the terminate of the task, the task has been already removed from the list. 
   we just delays the state switch until the tcb release, this is a normal behavior,
   
   Keeping the TCB can not resolve this issue because the state of the message queue is not reset to 0,
   As you know, for a lot of the time, we can't decide how an application uses system calls in usespace...


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-674092957


   The task state in all other cases is the same as the list in which the TCB resides.  I don't understand why you break that rule here.  It can break list management if the task state does not agree with the list that the TCB resides in.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] patacongo commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
patacongo commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-674146696


   It is very dangerous if the task_state becomes out of state with the lists.  That should not be permitted.  If that happens functions like nxsched_remove_blocked() and nxsched_add_blocked() will correct the OS lists.
   
   Can you keep the TCB in the list until you are ready to change the state.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 edited a comment on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 edited a comment on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-684157177


   > > @patacongo do you want:
   > > 1.Move line 158-161 after line 188
   > > 2.Or move line 177-188 before line 158
   > > But both modification will introduce the huge change of termination sequence. Since tcb is freed immediately at line 199, the change made by @anchao is reasonable.
   > 
   > I actually do not have any strong opinion. I do not fully understand either the problem or the solution. My only point is that disassociating task_state from the list that the TCB resides in is very dangerous. At least this danger should be documenting fully and clearly in comments so that no one seriously breaks things in the future.
   
   The problem is that if a thread is waiting for a message queue and other thread call pthread_cancel on it:
   1.nxtask_terminate change the state to TSTATE_TASK_INVALID
   2.and then call nxmq_recover to cleanup the message queue state
   3.but nxmq_recover skip all actions because the thread's state isn't match(step 1):
   ```
     /* Was the task waiting for a message queue to become non-empty? */
   
     if (tcb->task_state == TSTATE_WAIT_MQNOTEMPTY)
       {
         /* Decrement the count of waiters */
   
         DEBUGASSERT(tcb->msgwaitq && tcb->msgwaitq->nwaitnotempty > 0);
         tcb->msgwaitq->nwaitnotempty--;
       }
   ```  
   then, the message queue enter an inconsistent state and will panic with any upcomming mq_xxx invocation.
   Actually, this patch is regressioned by your commit:
   ```
   commit e24f2814015c6dcd9fc67edcd399ccbaf41c3669
   Author: Gregory Nutt <gn...@nuttx.org>
   Date:   Sun Nov 20 07:57:18 2016 -0600
   
       This commit adds a new internal interfaces and fixes a problem with three APIs in the SMP configuration.  The new internal interface is sched_cpu_p
   ause(tcb).  This function will pause a CPU if the task associated with 'tcb' is running on that CPU.  This allows a different CPU to modify that OS dat
   a stuctures associated with the CPU.  When the other CPU is resumed, those modifications can safely take place.
       
       The three fixes are to handle cases in the SMP configuration where one CPU does need to make modifications to TCB and data structures on a task tha
   t could be running running on another CPU.  Those three cases are task_delete(), task_restart(), and execution of signal handles.  In all three cases t
   he solutions is basically the same:  (1) Call sched_cpu_pause(tcb) to pause the CPU on which the task is running, (2) perform the necessary operations,
    then (3) call up_cpu_resume() to restart the paused CPU.
   ```
   
   > 
   > The meaning of task_state is the list that the TCB resides in. The PR breaks that meaning.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 edited a comment on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 edited a comment on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-684157177


   > > @patacongo do you want:
   > > 1.Move line 158-161 after line 188
   > > 2.Or move line 177-188 before line 158
   > > But both modification will introduce the huge change of termination sequence. Since tcb is freed immediately at line 199, the change made by @anchao is reasonable.
   > 
   > I actually do not have any strong opinion. I do not fully understand either the problem or the solution. My only point is that disassociating task_state from the list that the TCB resides in is very dangerous. At least this danger should be documenting fully and clearly in comments so that no one seriously breaks things in the future.
   
   The problem is that if a thread is waiting for a message queue and other thread call pthread_cancel on it:
   1.nxtask_terminate change the state to TSTATE_TASK_INVALID
   2.and then call nxmq_recover to cleanup the message queue state
   3.but nxmq_recover skip all actions because the thread's state isn't match(step 1):
   ```
     /* Was the task waiting for a message queue to become non-empty? */
   
     if (tcb->task_state == TSTATE_WAIT_MQNOTEMPTY)
       {
         /* Decrement the count of waiters */
   
         DEBUGASSERT(tcb->msgwaitq && tcb->msgwaitq->nwaitnotempty > 0);
         tcb->msgwaitq->nwaitnotempty--;
       }
   ```  
   then, the message queue enter an inconsistent state and will panic with any mq_xxx API.
   Actually, this patch is regressioned by your commit:
   ```
   commit e24f2814015c6dcd9fc67edcd399ccbaf41c3669
   Author: Gregory Nutt <gn...@nuttx.org>
   Date:   Sun Nov 20 07:57:18 2016 -0600
   
       This commit adds a new internal interfaces and fixes a problem with three APIs in the SMP configuration.  The new internal interface is sched_cpu_p
   ause(tcb).  This function will pause a CPU if the task associated with 'tcb' is running on that CPU.  This allows a different CPU to modify that OS dat
   a stuctures associated with the CPU.  When the other CPU is resumed, those modifications can safely take place.
       
       The three fixes are to handle cases in the SMP configuration where one CPU does need to make modifications to TCB and data structures on a task tha
   t could be running running on another CPU.  Those three cases are task_delete(), task_restart(), and execution of signal handles.  In all three cases t
   he solutions is basically the same:  (1) Call sched_cpu_pause(tcb) to pause the CPU on which the task is running, (2) perform the necessary operations,
    then (3) call up_cpu_resume() to restart the paused CPU.
   ```
   
   > 
   > The meaning of task_state is the list that the TCB resides in. The PR breaks that meaning.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-683615363


   @patacongo do you want:
   1.Move line 158-161 after line 188
   2.Or move line 177-188 before line 158
   But both modification will introduce the huge change of termination sequence. Since tcb is freed immediately at line 199, the change made by @anchao is reasonable.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #1585: sched/task: do not migrate the task state to INVALID

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on pull request #1585:
URL: https://github.com/apache/incubator-nuttx/pull/1585#issuecomment-684157177


   > > @patacongo do you want:
   > > 1.Move line 158-161 after line 188
   > > 2.Or move line 177-188 before line 158
   > > But both modification will introduce the huge change of termination sequence. Since tcb is freed immediately at line 199, the change made by @anchao is reasonable.
   > 
   > I actually do not have any strong opinion. I do not fully understand either the problem or the solution. My only point is that disassociating task_state from the list that the TCB resides in is very dangerous. At least this danger should be documenting fully and clearly in comments so that no one seriously breaks things in the future.
   
   The problem is that if a thread is waiting for a message queue and other thread call pthread_cancel on it:
   1.nxtask_terminate change the state to TSTATE_TASK_INVALID
   2.and then call nxmq_recover to cleanup the message queue state
   3.nxmq_recover skip all actions because the thread's state isn't match:
   ```
     /* Was the task waiting for a message queue to become non-empty? */
   
     if (tcb->task_state == TSTATE_WAIT_MQNOTEMPTY)
       {
         /* Decrement the count of waiters */
   
         DEBUGASSERT(tcb->msgwaitq && tcb->msgwaitq->nwaitnotempty > 0);
         tcb->msgwaitq->nwaitnotempty--;
       }
   ```  
   then, the message queue enter an inconsistent state and will panic with any mq_xxx API.
   Actually, this patch is regressioned by your commit:
   ```
   commit e24f2814015c6dcd9fc67edcd399ccbaf41c3669
   Author: Gregory Nutt <gn...@nuttx.org>
   Date:   Sun Nov 20 07:57:18 2016 -0600
   
       This commit adds a new internal interfaces and fixes a problem with three APIs in the SMP configuration.  The new internal interface is sched_cpu_p
   ause(tcb).  This function will pause a CPU if the task associated with 'tcb' is running on that CPU.  This allows a different CPU to modify that OS dat
   a stuctures associated with the CPU.  When the other CPU is resumed, those modifications can safely take place.
       
       The three fixes are to handle cases in the SMP configuration where one CPU does need to make modifications to TCB and data structures on a task tha
   t could be running running on another CPU.  Those three cases are task_delete(), task_restart(), and execution of signal handles.  In all three cases t
   he solutions is basically the same:  (1) Call sched_cpu_pause(tcb) to pause the CPU on which the task is running, (2) perform the necessary operations,
    then (3) call up_cpu_resume() to restart the paused CPU.
   ```
   
   > 
   > The meaning of task_state is the list that the TCB resides in. The PR breaks that meaning.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org