You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2022/07/16 14:38:27 UTC

[GitHub] [incubator-nuttx] fjpanag opened a new pull request, #6631: Added memory health checks.

fjpanag opened a new pull request, #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631

   ## Summary
   
   The PR #5368 removed the memory checks from the idle thread, due to issues with priority inheritance.
   
   This is an attempt to restore these checks, this time running in a worker.
   
   If this feature is enabled, a worker will be started during system boot, that performs the following checks periodically:
   * Checks the stacks of all threads for any possible overflow.
   * Checks the heap for any corruption.
   
   ## Impact
   
   None on existing systems.
   
   ## Testing
   
   I did a quick test on a custom board running on an STM32F427 MCU.  
   The check is started and executed as expected.
   
   ---
   
   Note, although this code is tested to be working, it is marked as a draft.
   
   I am not very sure about the correct structure of the code, and if it adheres to NuttX standards.
   Please review it, and provide me with some feedback.
   
   Mostly, is it correct to create this new dir `mm_check`, or does the code belong anywhere else?  
   Is the worker started at the correct place?  
   Any naming issues?  
   Etc...
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] fjpanag commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
fjpanag commented on PR #6631:
URL: https://github.com/apache/nuttx/pull/6631#issuecomment-1336161897

   > @fjpanag do you want to improve this patch?
   
   Yes yes, I will.
   
   I am struggling to find some free time, first for the TCP connections, and then for this.
   
   I will try my best to finish this soon.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] ALTracer commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
ALTracer commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1186977386

   > We can refine how to trigger the check. But irq_dispatch is the best place to get the reliable result since if kmm_checkforcorruption report the error, we can ascertain that the interrupted thread corrupt the memory. On the another hand, even LP work detect the memory corruption, it's still hard to identify which thread corrupt the memory.
   
   Okay, can someone launch SEGGER SystemView and verify the board spends a reasonable amount of time checking all the heaps on every context switch? What if a heap is located in slow external SDRAM?
   LPWORK thread, on the other hand, runs once a second or slower. I'd change the Kconfig setting from seconds to milliseconds, though. (Or microseconds, to be consistent with other apps)
   And there's always KAsan, which is designed to catch heap access errors with its guard pages.
   
   > It's better to enable STACK_CANARIES, ARMV8M_STACKCHECK_HARDWARE or ARMV[7|8]M_STACKCHECK, since they can report the stack overflow immediately.
   
   Yes, on armv8-m stackcheck_hardware is essentially free. However, on armv6/7-m it reserves a register (r10), requires userspace be compiled with additional CFLAGS, and checks stacks on function exits (not on context switch, which might be more often) IIRC. Again, that doesn't let us set a safety margin of e.g. 90%, like in this PR.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [nuttx] xiaoxiang781216 commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on PR #6631:
URL: https://github.com/apache/nuttx/pull/6631#issuecomment-1335390481

   @fjpanag do you want to improve this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1186994432

   > > We can refine how to trigger the check. But irq_dispatch is the best place to get the reliable result since if kmm_checkforcorruption report the error, we can ascertain that the interrupted thread corrupt the memory. On the another hand, even LP work detect the memory corruption, it's still hard to identify which thread corrupt the memory.
   > 
   > Okay, can someone launch SEGGER SystemView and verify the board spends a reasonable amount of time checking all the heaps on every context switch? What if a heap is located in slow external SDRAM?
   
   It isn't fast as expect, that's why we add the per TCB filter capability. One approach is do the check when the next thread is idle which give the same check as before.
   
   > LPWORK thread, on the other hand, runs once a second or slower. I'd change the Kconfig setting from seconds to milliseconds, though. (Or microseconds, to be consistent with other apps) And there's always KAsan, which is designed to catch heap access errors with its guard pages.
   
   If you like, the same policy can be inserted in irq_dispatch.
   
   > 
   > > It's better to enable STACK_CANARIES, ARMV8M_STACKCHECK_HARDWARE or ARMV[7|8]M_STACKCHECK, since they can report the stack overflow immediately.
   > 
   > Yes, on armv8-m stackcheck_hardware is essentially free. However, on armv6/7-m it reserves a register (r10), requires userspace be compiled with additional CFLAGS, and checks stacks on function exits (not on context switch, which might be more often) IIRC. Again, that doesn't let us set a safety margin of e.g. 90%, like in this PR.
   
   You can try STACK_CANARIES for arm6/7-m which is fast and lightweight:
   https://lwn.net/Articles/584225/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1186231460

   @fjpanag use the check here should be enough:
   https://github.com/apache/incubator-nuttx/blob/master/sched/irq/irq_dispatch.c#L177-L183


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1200434101

   @fjpanag @ALTracer 
   https://github.com/apache/incubator-nuttx/pull/6724 provide the method to enable the heap check from nsh.
   So, you can do:
   
   1. Check the heap corruption issue by:
       a. KSan https://github.com/apache/incubator-nuttx/blob/master/mm/Kconfig#L181-L187
       b. Per-thread heap check(https://github.com/apache/incubator-nuttx/pull/6724) by:
           echo 1 > /proc/xxx/heapcheck
       c. Track the allocator(pid and backtrace) by MM_BACKTRACE:
           https://github.com/apache/incubator-nuttx/blob/master/mm/Kconfig#L189-L200
           and memdump command from:
           https://github.com/apache/incubator-nuttx-apps/blob/master/nshlib/nsh_mmcmds.c#L53-L64
   2. Check the stack corruption issue by:
       a. The stack coloration:
           https://github.com/apache/incubator-nuttx/blob/master/Kconfig#L1869-L1878
       b. The stack canaries:
           https://github.com/apache/incubator-nuttx/blob/master/Kconfig#L1880-L1892
       c. The stack hw/sw check:
           https://github.com/apache/incubator-nuttx/blob/master/arch/arm/src/armv8-m/Kconfig#L108-L146
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] pkarashchenko commented on a diff in pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
pkarashchenko commented on code in PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#discussion_r922697668


##########
mm/mm_check/mm_check.c:
##########
@@ -0,0 +1,94 @@
+/****************************************************************************
+ * mm/mm_check/mm_check.c
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.  The
+ * ASF licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the
+ * License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
+ * License for the specific language governing permissions and limitations
+ * under the License.
+ *
+ ****************************************************************************/
+
+/****************************************************************************
+ * Included Files
+ ****************************************************************************/
+
+#include <nuttx/mm/mm.h>
+#include <nuttx/sched.h>
+#include <nuttx/irq.h>
+#include <nuttx/wqueue.h>
+#include <nuttx/arch.h>
+#include <nuttx/compiler.h>
+#include <nuttx/config.h>
+
+#include <time.h>
+#include <assert.h>
+#include <sys/types.h>
+
+/****************************************************************************
+ * Pre-processor Definitions
+ ****************************************************************************/
+
+#ifndef CONFIG_SCHED_LPWORK
+#error "Low priority work queue is required for the memory health checks."

Review Comment:
   ```suggestion
   #  error "Low priority work queue is required for the memory health checks."
   ```



##########
mm/mm_check/mm_check.c:
##########
@@ -0,0 +1,94 @@
+/****************************************************************************
+ * mm/mm_check/mm_check.c
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.  The
+ * ASF licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the
+ * License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
+ * License for the specific language governing permissions and limitations
+ * under the License.
+ *
+ ****************************************************************************/
+
+/****************************************************************************
+ * Included Files
+ ****************************************************************************/
+
+#include <nuttx/mm/mm.h>
+#include <nuttx/sched.h>
+#include <nuttx/irq.h>
+#include <nuttx/wqueue.h>
+#include <nuttx/arch.h>
+#include <nuttx/compiler.h>
+#include <nuttx/config.h>
+
+#include <time.h>
+#include <assert.h>
+#include <sys/types.h>
+
+/****************************************************************************
+ * Pre-processor Definitions
+ ****************************************************************************/
+
+#ifndef CONFIG_SCHED_LPWORK
+#error "Low priority work queue is required for the memory health checks."
+#endif
+
+/****************************************************************************
+ * Private Types
+ ****************************************************************************/
+
+static struct work_s work_q;
+
+/****************************************************************************
+ * Private Functions
+ ****************************************************************************/
+
+static void mm_check_worker(FAR void * arg)
+{
+  UNUSED(arg);
+
+  int i;
+  FAR struct tcb_s *tcb;
+  irqstate_t flags;
+  extern FAR struct tcb_s **g_pidhash;
+  extern volatile int g_npidhash;
+
+  for (i = 0; i < g_npidhash; i++)
+    {
+      flags = enter_critical_section();
+
+      tcb = g_pidhash[i];
+
+      if (tcb && ((up_check_tcbstack(tcb) * 100 / tcb->adj_stack_size)
+          > CONFIG_MM_STACK_USAGE_SAFE_PERCENT))
+        {
+          PANIC();
+        }
+
+      leave_critical_section(flags);
+    }
+
+  kmm_checkcorruption();
+
+  work_queue(LPWORK, &work_q, mm_check_worker, NULL,
+             (CONFIG_MM_CHECK_PERIOD * CLOCKS_PER_SEC));

Review Comment:
   ```suggestion
                CONFIG_MM_CHECK_PERIOD * CLOCKS_PER_SEC);
   ```



##########
mm/mm_check/mm_check.c:
##########
@@ -0,0 +1,94 @@
+/****************************************************************************
+ * mm/mm_check/mm_check.c
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.  The
+ * ASF licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the
+ * License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
+ * License for the specific language governing permissions and limitations
+ * under the License.
+ *
+ ****************************************************************************/
+
+/****************************************************************************
+ * Included Files
+ ****************************************************************************/
+
+#include <nuttx/mm/mm.h>
+#include <nuttx/sched.h>
+#include <nuttx/irq.h>
+#include <nuttx/wqueue.h>
+#include <nuttx/arch.h>
+#include <nuttx/compiler.h>
+#include <nuttx/config.h>
+
+#include <time.h>
+#include <assert.h>
+#include <sys/types.h>
+
+/****************************************************************************
+ * Pre-processor Definitions
+ ****************************************************************************/
+
+#ifndef CONFIG_SCHED_LPWORK
+#error "Low priority work queue is required for the memory health checks."
+#endif
+
+/****************************************************************************
+ * Private Types
+ ****************************************************************************/
+
+static struct work_s work_q;
+
+/****************************************************************************
+ * Private Functions
+ ****************************************************************************/
+
+static void mm_check_worker(FAR void * arg)
+{
+  UNUSED(arg);
+
+  int i;
+  FAR struct tcb_s *tcb;
+  irqstate_t flags;
+  extern FAR struct tcb_s **g_pidhash;
+  extern volatile int g_npidhash;
+
+  for (i = 0; i < g_npidhash; i++)
+    {
+      flags = enter_critical_section();
+
+      tcb = g_pidhash[i];
+
+      if (tcb && ((up_check_tcbstack(tcb) * 100 / tcb->adj_stack_size)
+          > CONFIG_MM_STACK_USAGE_SAFE_PERCENT))
+        {
+          PANIC();
+        }
+
+      leave_critical_section(flags);
+    }
+
+  kmm_checkcorruption();
+
+  work_queue(LPWORK, &work_q, mm_check_worker, NULL,
+             (CONFIG_MM_CHECK_PERIOD * CLOCKS_PER_SEC));
+}
+
+/****************************************************************************
+ * Public Functions
+ ****************************************************************************/
+
+void mm_check_init()

Review Comment:
   ```suggestion
   void mm_check_init(void)
   ```



##########
mm/mm_check/mm_check.c:
##########
@@ -0,0 +1,94 @@
+/****************************************************************************
+ * mm/mm_check/mm_check.c
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.  The
+ * ASF licenses this file to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance with the
+ * License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
+ * WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.  See the
+ * License for the specific language governing permissions and limitations
+ * under the License.
+ *
+ ****************************************************************************/
+
+/****************************************************************************
+ * Included Files
+ ****************************************************************************/
+
+#include <nuttx/mm/mm.h>
+#include <nuttx/sched.h>
+#include <nuttx/irq.h>
+#include <nuttx/wqueue.h>
+#include <nuttx/arch.h>
+#include <nuttx/compiler.h>
+#include <nuttx/config.h>

Review Comment:
   Please make this a first include



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] fjpanag commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
fjpanag commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1186292209

   > @fjpanag use the check here should be enough: https://github.com/apache/incubator-nuttx/blob/master/sched/irq/irq_dispatch.c#L177-L183
   
   I wasn't aware that this check is also present there.
   
   Shall I drop this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] acassis commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
acassis commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1186279920

   Hi @fjpanag, you forgot to comment this feature shouldn't be used on production. Suggestion: keep it dependent on CONFIG_DEBUG_MM


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] xiaoxiang781216 commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1186703810

   > > I wasn't aware that this check is also present there.
   > > Shall I drop this PR?
   > 
   > The check in `irq_dispatch()` only calls `kmm_checkforcorruption()` on IRQ exits to check heaps for TCBs **marked with TCB_FLAG_MEM_CHECK** (and there are no mentions of the flag whatsoever in the master codebase). It also build-depends on `CONFIG_DEBUG_MM`, as @acassis suggests for this checker as well.
   > 
   
   We can refine how to trigger the check. But irq_dispatch is the best place to get the reliable result since if kmm_checkforcorruption report the error, we can ascertain that the interrupted thread corrupt the memory. On the another hand, even LP work detect the memory corruption, it's still hard to identify which thread corrupt the memory.
   
   > I see the PR as still useful because it also checks stacks, and it doesn't require setting a tcb flag. Yes, `nsh> ps` can show stack usage in interactive shell, though it requires STACK_COLORATION and provides virtually a very slow sample rate, as does `stackmonitor`. However, it might conflict / do double work with `irq_dispatch`. The good thing is we're decoupling memory checks from IDLE thread which used to cause problems in #5266, and that issue/PR mentioned something about moving checks to LPWORK thread (half a year ago).
   > 
   
   It's better to enable STACK_CANARIES, ARMV8M_STACKCHECK_HARDWARE or ARMV[7|8]M_STACKCHECK, since they can report the stack overflow immediately.
   
   > @fjpanag How many processes/threads (and pthreads) were running on your STM32F427 board? Did you enable DEBUG_MM as well? Can you somehow verify that there won't be deadlock problems with semaphores of mm? Was there any external (FMC SDRAM) memory attached to system heaps?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] ALTracer commented on pull request #6631: Added memory health checks.

Posted by GitBox <gi...@apache.org>.
ALTracer commented on PR #6631:
URL: https://github.com/apache/incubator-nuttx/pull/6631#issuecomment-1186595988

   > I wasn't aware that this check is also present there.
   > 
   > Shall I drop this PR?
   
   The check in `irq_dispatch()` only calls `kmm_checkforcorruption()` on IRQ exits to check heaps for TCBs **marked with TCB_FLAG_MEM_CHECK** (and there are no mentions of the flag whatsoever in the master codebase). 
   It also build-depends on `CONFIG_DEBUG_MM`, as @acassis suggests for this checker as well. 
   
   I see the PR as still useful because it also checks stacks, and it doesn't require setting a tcb flag. Yes, `nsh> ps` can show stack usage in interactive shell, though it requires STACK_COLORATION and provides virtually a very slow sample rate, as does `stackmonitor`. However, it might conflict / do double work with `irq_dispatch`. The good thing is we're decoupling memory checks from IDLE thread which used to cause problems in #5266, and that issue/PR mentioned something about moving checks to LPWORK thread (half a year ago).
   
   @fjpanag How many processes/threads (and pthreads) were running on your STM32F427 board? Did you enable DEBUG_MM as well? Can you somehow verify that there won't be deadlock problems with semaphores of mm? Was there any external (FMC SDRAM) memory attached to system heaps?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org