You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nuttx.apache.org by "Matias N." <ma...@imap.cc> on 2020/06/21 17:15:07 UTC

debugging mm_free debugassert()

Hi,
I'm having a weird bug appear during boot. It is being triggered at the first call of mm_free which comes from a call to mtdconfig driver. I have enabled debug information and I'm also debugging at the point before crash but since I don't understand the memory manager I can't tell where this is coming from.
The assertion line is (mm_free.c:185):

DEBUGASSERT((node->preceding & ~MM_ALLOC_BIT) == prev->size);

The output on the console is:

BCDF
mm_initialize: Heap: start=0x200033a8 size=85080
mm_addregion: Region 1: base=0x200033a8 size=85072
mm_malloc: Allocated 0x200033c0, size 400
mm_malloc: Allocated 0x20003550, size 208
mm_malloc: Allocated 0x20003620, size 336
mm_malloc: Allocated 0x20003770, size 272
mm_malloc: Allocated 0x20003880, size 144
mm_malloc: Allocated 0x20003910, size 176
mm_malloc: Allocated 0x200039c0, size 336
mm_malloc: Allocated 0x20003b10, size 304
mm_addregion: Region 2: base=0x10000000 size=32768
mm_malloc: Allocated 0x10000010, size 48
mm_malloc: Allocated 0x10000040, size 48
mm_malloc: Allocated 0x10000070, size 48
mm_malloc: Allocated 0x100000a0, size 48
mm_malloc: Allocated 0x100000d0, size 48
mm_malloc: Allocated 0x10000100, size 640
mm_malloc: Allocated 0x10000380, size 80
mm_malloc: Allocated 0x100003d0, size 80
mm_malloc: Allocated 0x10000420, size 80
mm_malloc: Allocated 0x10000470, size 32
mm_malloc: Allocated 0x10000490, size 240
mm_malloc: Allocated 0x10000580, size 640
mm_malloc: Allocated 0x10000800, size 80
mm_malloc: Allocated 0x10000850, size 80
mm_malloc: Allocated 0x100008a0, size 80
mm_malloc: Allocated 0x100008f0, size 2080
mm_malloc: Allocated 0x10001110, size 240
mm_malloc: Allocated 0x10001200, size 640
mm_malloc: Allocated 0x10001480, size 80
mm_malloc: Allocated 0x100014d0, size 80
mm_malloc: Allocated 0x10001520, size 80
mm_malloc: Allocated 0x10001570, size 2080
mm_malloc: Allocated 0x10001d90, size 336
mm_malloc: Allocated 0x10001ee0, size 32
mm_malloc: Allocated 0x10001f00, size 48
mm_malloc: Allocated 0x10001f30, size 48
mm_malloc: Allocated 0x10001f60, size 48
mm_malloc: Allocated 0x10001f90, size 32
mm_malloc: Allocated 0x10001fb0, size 48
mm_malloc: Allocated 0x10001fe0, size 16
mm_malloc: Allocated 0x10001ff0, size 32
mm_malloc: Allocated 0x10002010, size 48
Mounting procfs to /proc
mm_malloc: Allocated 0x10002040, size 48
mm_malloc: Allocated 0x10002070, size 128
mm_malloc: Allocated 0x100020f0, size 160
mm_malloc: Allocated 0x10002190, size 48
mm_malloc: Allocated 0x100021c0, size 16
mm_malloc: Allocated 0x100021d0, size 64
mm_malloc: Allocated 0x10002210, size 48
mm_malloc: Allocated 0x10002240, size 48
mm_malloc: Allocated 0x10002270, size 32
mm_malloc: Allocated 0x10002290, size 48
mm_malloc: Allocated 0x100022c0, size 48
mm_malloc: Allocated 0x100022f0, size 32
mm_malloc: Allocated 0x10002310, size 48
mm_malloc: Allocated 0x10002340, size 32
mm_malloc: Allocated 0x10002360, size 48
mm_malloc: Allocated 0x10002390, size 16
mm_malloc: Allocated 0x100023a0, size 32
mm_malloc: Allocated 0x100023c0, size 48
mm_malloc: Allocated 0x100023f0, size 32
mm_malloc: Allocated 0x10002410, size 48
mm_malloc: Allocated 0x10002440, size 48
mm_malloc: Allocated 0x10002470, size 48
mm_malloc: Allocated 0x100024a0, size 688
mm_malloc: Allocated 0x10002750, size 32
mm_malloc: Allocated 0x10002770, size 32
mm_malloc: Allocated 0x10002790, size 80
mm_malloc: Allocated 0x100027e0, size 240
mm_malloc: Allocated 0x100028d0, size 640
mm_malloc: Allocated 0x10002b50, size 32
mm_malloc: Allocated 0x10002b70, size 80
mm_malloc: Allocated 0x10002bc0, size 80
mm_malloc: Allocated 0x10002c10, size 80
mm_malloc: Allocated 0x10002c60, size 2080
mm_malloc: Allocated 0x10003480, size 80
mm_free: Freeing 0x10003480

And during debug I see this:

Where preceeding (after removing the MM_ALLOC_BIT) is zero, thus fails the assertion.

Any hints on what could this be?

Best,
Matias

Re: debugging mm_free debugassert()

Posted by Gregory Nutt <sp...@gmail.com>.
> Ok, I will try that.
> If I enable stackchecking should this kind of stack overrun be catched?

Probably.  The assertion output should should the stack usage of every 
active task in that case.

There are lots of ways that memory corruption can occur.  Stack overrun 
is just one common cause.  And one that would make sense in this case 
because it seems only to be effecting your configuration.


Re: debugging mm_free debugassert()

Posted by "Matias N." <ma...@imap.cc>.
Ok, I will try that.
If I enable stackchecking should this kind of stack overrun be catched?

On Sun, Jun 21, 2020, at 14:22, Gregory Nutt wrote:
> 
> > The most common cause... in fact the only cause that I am aware of ... 
> > for heap failure is memory corruption. 
> 
> And a very common cause of heap memory corruption is a task that has 
> overrun its stack and clobber the meta data at the beginning of the next 
> chunk.
> 
> A simple thing you can do is to bump up stack sizes to assure this is 
> common case is not the cause of your problem.
> 
> 

Re: debugging mm_free debugassert()

Posted by Gregory Nutt <sp...@gmail.com>.
> The most common cause... in fact the only cause that I am aware of ... 
> for heap failure is memory corruption. 

And a very common cause of heap memory corruption is a task that has 
overrun its stack and clobber the meta data at the beginning of the next 
chunk.

A simple thing you can do is to bump up stack sizes to assure this is 
common case is not the cause of your problem.


Re: debugging mm_free debugassert()

Posted by Gregory Nutt <sp...@gmail.com>.
> I'm having a weird bug appear during boot. It is being triggered at 
> the first call of mm_free which comes from a call to mtdconfig driver. 
> I have enabled debug information and I'm also debugging at the point 
> before crash but since I don't understand the memory manager I can't 
> tell where this is coming from.
> The assertion line is (mm_free.c:185):
>
> DEBUGASSERT((node->preceding & ~MM_ALLOC_BIT) == prev->size);

The most common cause... in fact the only cause that I am aware of ... 
for heap failure is memory corruption.

I normally debug this bay calling mallinfo() has selected places.  If 
the heap has been corrupted mallinfo() will fail. With a little effort 
you can find the cause of the failure between the last good call to 
mallinfo() and the first failed call to mallinfo().


Re: debugging mm_free debugassert()

Posted by "Matias N." <ma...@imap.cc>.
Ok, managed to fit binary in flash without optimizations and now it runs. It would seem that stack checking + optimizations is not a good combination.

On Sun, Jun 21, 2020, at 14:39, Matias N. wrote:
> Thanks, I increased the stack size and indeed it worked. 
> I'd really like to try the stack checking feature since I hate guessing the stack size. Always leads to these kind of hard to find errors. 
> 
> I just enabled it and had to enable size optimization for the build otherwise the binary would exceed the flash size (I'm using the instrument-functions flag). However, with this enabled the system gets to print ABCDF and then hangs. When pausing execution with the debugger it is hung here:
> 
> 
> 
> Is it expected that stack checking may fail if size optimization is turned on?
> 
> On Sun, Jun 21, 2020, at 14:33, Gregory Nutt wrote:
>> 
>>> mm_free: Freeing 0x10003480
>> Notice that node has the value 0x10003478. So that is the 8-byte header at the beginning of each chunk, specifically of the chunk that is being freed. So you an assume that 0x10003478 was valid when 0x10003480 was allocated, but corrupted sometime before it was freed.

>>> 
>> 
>> 
>> *Attachments:*
>>  * image.png
> 
> 
> *Attachments:*
>  * image.png
>  * image.png

Re: debugging mm_free debugassert()

Posted by "Matias N." <ma...@imap.cc>.
Perfect, will try that.

On Sun, Jun 21, 2020, at 15:32, Gregory Nutt wrote:
> There are other ways to monitor stack size. The STACK COLORATION logic is much simpler than the stack check option. The stack check option if more precise; the stack coloration options is lower overhead.
> 
> The stack coloration logic also enables other useful features. For example, the NSH 'ps' command will also should stack usage. There is also an optional utility at apps/system/stackmonitor that will periodically show stack usage for all tasks.
> 
> Greg
> 
> On 6/21/2020 11:39 AM, Matias N. wrote:
>> Thanks, I increased the stack size and indeed it worked. 
>> I'd really like to try the stack checking feature since I hate guessing the stack size. Always leads to these kind of hard to find errors. 
>> 
>> I just enabled it and had to enable size optimization for the build otherwise the binary would exceed the flash size (I'm using the instrument-functions flag). However, with this enabled the system gets to print ABCDF and then hangs. When pausing execution with the debugger it is hung here:
>> 
>> 
>> 
>> Is it expected that stack checking may fail if size optimization is turned on?
>> 
>> On Sun, Jun 21, 2020, at 14:33, Gregory Nutt wrote:
>>> 
>>>> mm_free: Freeing 0x10003480
>>> Notice that node has the value 0x10003478. So that is the 8-byte header at the beginning of each chunk, specifically of the chunk that is being freed. So you an assume that 0x10003478 was valid when 0x10003480 was allocated, but corrupted sometime before it was freed.

>>>> 
>>> 
>>> 
>>> *Attachments:*
>>>  * image.png
>> 
> 

> 
> *Attachments:*
>  * image.png
>  * image.png

Re: debugging mm_free debugassert()

Posted by Gregory Nutt <sp...@gmail.com>.
There are other ways to monitor stack size.  The STACK COLORATION logic 
is much simpler than the stack check option.  The stack check option if 
more precise; the stack coloration options is lower overhead.

The stack coloration logic also enables other useful features.  For 
example, the NSH 'ps' command will also should stack usage.  There is 
also an optional utility at apps/system/stackmonitor that will 
periodically show stack usage for all tasks.

Greg

On 6/21/2020 11:39 AM, Matias N. wrote:
> Thanks, I increased the stack size and indeed it worked.
> I'd really like to try the stack checking feature since I hate 
> guessing the stack size. Always leads to these kind of hard to find 
> errors.
>
> I just enabled it and had to enable size optimization for the build 
> otherwise the binary would exceed the flash size (I'm using the 
> instrument-functions flag). However, with this enabled the system gets 
> to print ABCDF and then hangs. When pausing execution with the 
> debugger it is hung here:
>
>
>
> Is it expected that stack checking may fail if size optimization is 
> turned on?
>
> On Sun, Jun 21, 2020, at 14:33, Gregory Nutt wrote:
>>
>>> mm_free: Freeing 0x10003480
>>
>> Notice that node has the value 0x10003478.  So that is the 8-byte 
>> header at the beginning of each chunk, specifically of the chunk that 
>> is being freed.  So you an assume that 0x10003478 was valid when 
>> 0x10003480 was allocated, but corrupted sometime before it was freed.
>>
>>>
>>
>>
>> *Attachments:*
>>
>>   * image.png
>>
>


Re: debugging mm_free debugassert()

Posted by "Matias N." <ma...@imap.cc>.
Thanks, I increased the stack size and indeed it worked. 
I'd really like to try the stack checking feature since I hate guessing the stack size. Always leads to these kind of hard to find errors. 

I just enabled it and had to enable size optimization for the build otherwise the binary would exceed the flash size (I'm using the instrument-functions flag). However, with this enabled the system gets to print ABCDF and then hangs. When pausing execution with the debugger it is hung here:



Is it expected that stack checking may fail if size optimization is turned on?

On Sun, Jun 21, 2020, at 14:33, Gregory Nutt wrote:
> 
>> mm_free: Freeing 0x10003480
> Notice that node has the value 0x10003478. So that is the 8-byte header at the beginning of each chunk, specifically of the chunk that is being freed. So you an assume that 0x10003478 was valid when 0x10003480 was allocated, but corrupted sometime before it was freed.

>> 
> 
> 
> *Attachments:*
>  * image.png

Re: debugging mm_free debugassert()

Posted by Gregory Nutt <sp...@gmail.com>.
> mm_free: Freeing 0x10003480

Notice that node has the value 0x10003478.  So that is the 8-byte header 
at the beginning of each chunk, specifically of the chunk that is being 
freed.  So you an assume that 0x10003478 was valid when 0x10003480 was 
allocated, but corrupted sometime before it was freed.